Re: ResourceOwner refactoring
Hello Heikki, I've stumbled upon yet another anomaly introduced with b8bff07da. Please try the following script: SELECT 'init' FROM pg_create_logical_replication_slot('slot', 'test_decoding'); CREATE TABLE tbl(t text); BEGIN; TRUNCATE table tbl; SELECT data FROM pg_logical_slot_get_changes('slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1'); On b8bff07da (and the current HEAD), it ends up with: ERROR: ResourceOwnerEnlarge called after release started But on the previous commit, I get no error: data -- (0 rows) Best regards, Alexander
Re: To what extent should tests rely on VACUUM ANALYZE?
29.03.2024 11:59, Alexander Lakhin wrote: But it looks like subselect is not the only test that can fail due to vacuum instability. I see that create_index also suffers from cranky ConditionalLockBufferForCleanup() (+if (rand() % 10 == 0) ... Just for the record, I think I've reproduced the same failure as: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2024-03-17%2003%3A03%3A57 not ok 66 + create_index 27509 ms ... and the similar occurrences: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2024-01-02%2007%3A09%3A09 not ok 66 + create_index 25830 ms https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2023-11-15%2006%3A16%3A15 not ok 66 + create_index 38508 ms by running 8 027_stream_regress instances in parallel on a slow ARMv7 device like this: for i in `seq 10`; do echo "I $i"; parallel --halt now,fail=1 -j8 \ --linebuffer --tag NO_TEMP_INSTALL=1 make -s check -C \ src/test/recovery_{}/ PROVE_TESTS="t/027*" ::: `seq 8` || break; done 5 5 # Failed test 'regression tests pass' 5 # at t/027_stream_regress.pl line 92. 5 # got: '256' 5 # expected: '0' 5 t/027_stream_regress.pl .. 5 Dubious, test returned 1 (wstat 256, 0x100) 5 Failed 1/6 subtests not ok 66 + create_index 152995 ms ... === dumping .../src/test/recovery_5/tmp_check/regression.diffs === diff -U3 .../src/test/regress/expected/create_index.out .../src/test/recovery_5/tmp_check/results/create_index.out --- .../src/test/regress/expected/create_index.out 2024-05-30 15:30:34.523048633 + +++ .../src/test/recovery_5/tmp_check/results/create_index.out 2024-05-31 13:07:56.359001362 + @@ -1916,11 +1916,15 @@ SELECT unique1 FROM tenk1 WHERE unique1 IN (1,42,7) ORDER BY unique1; - QUERY PLAN - Index Only Scan using tenk1_unique1 on tenk1 - Index Cond: (unique1 = ANY ('{1,42,7}'::integer[])) -(2 rows) + QUERY PLAN +--- + Sort + Sort Key: unique1 + -> Bitmap Heap Scan on tenk1 + Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[])) + -> Bitmap Index Scan on tenk1_unique1 + Index Cond: (unique1 = ANY ('{1,42,7}'::integer[])) +(6 rows) SELECT unique1 FROM tenk1 WHERE unique1 IN (1,42,7) @@ -1936,12 +1940,13 @@ SELECT thousand, tenthous FROM tenk1 WHERE thousand < 2 AND tenthous IN (1001,3000) ORDER BY thousand; - QUERY PLAN - Index Only Scan using tenk1_thous_tenthous on tenk1 - Index Cond: (thousand < 2) - Filter: (tenthous = ANY ('{1001,3000}'::integer[])) -(3 rows) + QUERY PLAN +-- + Sort + Sort Key: thousand + -> Index Only Scan using tenk1_thous_tenthous on tenk1 + Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[]))) +(4 rows) SELECT thousand, tenthous FROM tenk1 WHERE thousand < 2 AND tenthous IN (1001,3000) === EOF === I got failures on iteration 2, 3, 7, 1. But with the repeated VACUUM ANALYZE: --- a/src/test/regress/sql/test_setup.sql +++ b/src/test/regress/sql/test_setup.sql @@ -163,6 +163,8 @@ CREATE TABLE tenk1 ( \set filename :abs_srcdir '/data/tenk.data' COPY tenk1 FROM :'filename'; VACUUM ANALYZE tenk1; +VACUUM ANALYZE tenk1; +VACUUM ANALYZE tenk1; 20 iterations succeeded in the same environment. So I think that that IOS plan change can be explained by the issue discussed here. Best regards, Alexander
The xversion-upgrade test fails to stop server
Hello Andrew, While reviewing recent buildfarm failures, I came across this one: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake=2024-05-23%2004%3A11%3A03 upgrade.crake/REL_16_STABLE/REL9_5_STABLE-ctl4.log waiting for server to shut down... failed pg_ctl: server does not shut down Looking at: https://github.com/PGBuildFarm/client-code/blob/05014d50e/PGBuild/Modules/TestUpgradeXversion.pm#L641 I see that ctl4.log is created after updating extensions and REL9_5_STABLE-update_extensions.log contains: You are now connected to database "contrib_regression_redis_fdw" as user "buildfarm". ALTER EXTENSION "hstore" UPDATE; ALTER EXTENSION You are now connected to database "contrib_regression_btree_gin" as user "buildfarm". ALTER EXTENSION "btree_gin" UPDATE; ALTER EXTENSION ... but I see no corresponding server log file containing these commands in the failure log. When running the same test locally, I find these in inst/upgrade_log. Maybe uploading this log file too would help to understand what is the cause of the failure... Best regards, Alexander
001_rep_changes.pl fails due to publisher stuck on shutdown
Hello hackers, As a recent buildfarm test failure [1] shows: [14:33:02.374](0.333s) ok 23 - update works with dropped subscriber column ### Stopping node "publisher" using mode fast # Running: pg_ctl -D /home/bf/bf-build/adder/HEAD/pgsql.build/testrun/subscription/001_rep_changes/data/t_001_rep_changes_publisher_data/pgdata -m fast stop waiting for server to shut down.. ... ... ... .. failed pg_ctl: server does not shut down # pg_ctl stop failed: 256 # Postmaster PID for node "publisher" is 549 [14:39:04.375](362.001s) Bail out! pg_ctl stop failed 001_rep_changes_publisher.log 2024-05-16 14:33:02.907 UTC [2238704][client backend][4/22:0] LOG: statement: DELETE FROM tab_rep 2024-05-16 14:33:02.925 UTC [2238704][client backend][:0] LOG: disconnection: session time: 0:00:00.078 user=bf database=postgres host=[local] 2024-05-16 14:33:02.939 UTC [549][postmaster][:0] LOG: received fast shutdown request 2024-05-16 14:33:03.000 UTC [549][postmaster][:0] LOG: aborting any active transactions 2024-05-16 14:33:03.049 UTC [549][postmaster][:0] LOG: background worker "logical replication launcher" (PID 2223110) exited with exit code 1 2024-05-16 14:33:03.062 UTC [901][checkpointer][:0] LOG: shutting down 2024-05-16 14:39:04.377 UTC [549][postmaster][:0] LOG: received immediate shutdown request 2024-05-16 14:39:04.382 UTC [549][postmaster][:0] LOG: database system is shut down the publisher node may hang on stopping. I reproduced the failure (with aggressive autovacuum) locally and discovered that it happens because: 1) checkpointer calls WalSndInitStopping() (which sends PROCSIG_WALSND_INIT_STOPPING to walsender), and then spins inside WalSndWaitStopping() indefinitely, because: 2) walsender receives the signal, sets got_STOPPING = true, but can't exit WalSndLoop(): 3) it never sets got_SIGUSR2 (to get into WalSndDone()) in XLogSendLogical(): 4) it never sets WalSndCaughtUp to true: 5) logical_decoding_ctx->reader->EndRecPtr can't reach flushPtr in XLogSendLogical(): 6) EndRecPtr doesn't advance in XLogNextRecord(): 7) XLogDecodeNextRecord() fails do decode a record that crosses a page boundary: 8) ReadPageInternal() (commented "Wait for the next page to become available") constantly returns XLREAD_FAIL: 9) state->routine.page_read()/logical_read_xlog_page() constantly returns -1: 10) flushptr = WalSndWaitForWal() stops advancing, because got_STOPPING == true (see 2). That is, walsender doesn't let itself to catch up, if it gets the stop signal when it's lagging behind and decoding a record requires reading the next wal page. Please look at the reproducing test (based on 001_rep_changes.pl) attached. If fails for me as below: # 17 Bailout called. Further testing stopped: pg_ctl stop failed FAILED--Further testing stopped: pg_ctl stop failed make: *** [Makefile:21: check] Ошибка 255 [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-05-16%2014%3A22%3A38 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop=2024-04-24%2014%3A38%3A35 (apparently the same) Best regards, Alexander 099_walsender_stop.pl Description: Perl program
Test 031_recovery_conflict fails when a conflict counted twice
Hello hackers, As a recent buildfarm test failure on olingo (which builds postgres with -O1 and address sanitizer) [1] shows: ... [23:12:02.127](0.166s) not ok 6 - snapshot conflict: stats show conflict on standby [23:12:02.130](0.003s) # Failed test 'snapshot conflict: stats show conflict on standby' # at /home/bf/bf-build/olingo/HEAD/pgsql/src/test/recovery/t/031_recovery_conflict.pl line 332. [23:12:02.130](0.000s) # got: '2' # expected: '1' ... [23:12:06.848](1.291s) not ok 17 - 5 recovery conflicts shown in pg_stat_database [23:12:06.887](0.040s) # Failed test '5 recovery conflicts shown in pg_stat_database' # at /home/bf/bf-build/olingo/HEAD/pgsql/src/test/recovery/t/031_recovery_conflict.pl line 286. [23:12:06.887](0.000s) # got: '6' # expected: '5' Waiting for replication conn standby's replay_lsn to pass 0/3459160 on primary done ... pgsql.build/testrun/recovery/031_recovery_conflict/log/031_recovery_conflict_standby.log: 2024-05-15 23:12:01.616 UTC [1299981][client backend][2/2:0] LOG: statement: FETCH FORWARD FROM test_recovery_conflict_cursor; 2024-05-15 23:12:01.617 UTC [1299981][client backend][2/2:0] LOG: statement: ; 2024-05-15 23:12:01.910 UTC [1297595][startup][34/0:0] LOG: recovery still waiting after 15.289 ms: recovery conflict on snapshot 2024-05-15 23:12:01.910 UTC [1297595][startup][34/0:0] DETAIL: Conflicting process: 1299981. 2024-05-15 23:12:01.910 UTC [1297595][startup][34/0:0] CONTEXT: WAL redo at 0/344F468 for Heap2/PRUNE_VACUUM_SCAN: snapshotConflictHorizon: 746, isCatalogRel: F, nplans: 2, nredirected: 18, ndead: 0, nunused: 0, plans: [{ xmax: 0, infomask: 2816, infomask2: 2, ntuples: 2, offsets: [21, 22] }, { xmax: 0, infomask: 11008, infomask2: 32770, ntuples: 18, offsets: [41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58] }], redirected: [23->41, 24->42, 25->43, 26->44, 27->45, 28->46, 29->47, 30->48, 31->49, 32->50, 33->51, 34->52, 35->53, 36->54, 37->55, 38->56, 39->57, 40->58]; blkref #0: rel 1663/16385/16386, blk 0 2024-05-15 23:12:01.959 UTC [1299981][client backend][2/2:0] FATAL: terminating connection due to conflict with recovery 2024-05-15 23:12:01.959 UTC [1299981][client backend][2/2:0] DETAIL: User query might have needed to see row versions that must be removed. 2024-05-15 23:12:01.959 UTC [1299981][client backend][2/2:0] HINT: In a moment you should be able to reconnect to the database and repeat your command. vvv an important difference from a successful test run 2024-05-15 23:12:01.966 UTC [1299981][client backend][2/2:0] LOG: could not send data to client: Broken pipe 2024-05-15 23:12:01.966 UTC [1299981][client backend][2/2:0] FATAL: terminating connection due to conflict with recovery 2024-05-15 23:12:01.966 UTC [1299981][client backend][2/2:0] DETAIL: User query might have needed to see row versions that must be removed. 2024-05-15 23:12:01.966 UTC [1299981][client backend][2/2:0] HINT: In a moment you should be able to reconnect to the database and repeat your command. ^^^ test 031_recovery_conflict may fail in the following scenario: 031_recovery_conflict.pl: executes a query, which produces a conflict: ## RECOVERY CONFLICT 2: Snapshot conflict ... $psql_standby->query_safe(...) startup process: detects a snapshot conflict and sends PROCSIG_RECOVERY_CONFLICT_SNAPSHOT: (ResolveRecoveryConflictWithVirtualXIDs -> CancelVirtualTransaction) to the client backend client backend: receives and processes the signal: HandleRecoveryConflictInterrupt; ProcessClientReadInterrupt -> ProcessInterrupts -> ProcessRecoveryConflictInterrupts -> ProcessRecoveryConflictInterrupt, reports the recovery conflict: pgstat_report_recovery_conflict(reason); and reports the error: ereport(FATAL, ... "terminating connection due to conflict with recovery" ...) sends the message to the server log: errfinish -> EmitErrorReport -> send_message_to_server_log 031_recovery_conflict.pl: # finds the message in the log and resets psql connection: check_conflict_log( "User query might have needed to see row versions that must be removed"); $psql_standby->reconnect_and_clear(); startup process: keeps sending PROCSIG_RECOVERY_CONFLICT_SNAPSHOT to the client backend in a loop inside ResolveRecoveryConflictWithVirtualXIDs client backend: tries to send the message to the client: send_message_to_frontend -> socket_flush -> internal_flush_buffer, gets an error (EPIPE) from secure_write, and calls ereport(COMMERROR, (errcode_for_socket_access(), errmsg("could not send data to client: %m"))); receives the following PROCSIG_RECOVERY_CONFLICT_SNAPSHOT signal and processes it
Re: Improving tracking/processing of buildfarm test failures
Hello Amit and Noah, 24.05.2024 14:15, Amit Kapila wrote: I feel it is a good idea to do something about this. It makes sense to start with something simple and see how it works. I think this can also help us whether we need to chase a particular BF failure immediately after committing. 24.05.2024 23:00, Noah Misch wrote: (I could start with the second approach, if you don't mind, and we'll see how it works.) Certainly you doing (2) can only help, though it may help less than (1). Thank you for paying attention to this! I've created such page to accumulate information on test failures: https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures I've deliberately added a trivial issue with partition_split, which is doomed to be fixed soon, to test the information workflow, and I'm going to add a few other items in the coming days. Please share your comments and suggestions, if any. Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hello, 24.05.2024 22:29, Tom Lane wrote: The partition_split test has unstable results, as shown at [1]. I suggest adding "ORDER BY conname" to the two queries shown to fail there. Better look at other queries in the test for possible similar problems, too. Yes, I've just reproduced it on an aarch64 device as follows: echo "autovacuum_naptime = 1 autovacuum_vacuum_threshold = 1 autovacuum_analyze_threshold = 1 " > ~/temp.config TEMP_CONFIG=~/temp.config TESTS="$(printf 'partition_split %.0s' `seq 100`)" make -s check-tests ... ok 80 - partition_split 749 ms not ok 81 - partition_split 728 ms ok 82 - partition_split 732 ms $ cat src/test/regress/regression.diffs diff -U3 .../src/test/regress/expected/partition_split.out .../src/test/regress/results/partition_split.out --- .../src/test/regress/expected/partition_split.out 2024-05-15 17:15:57.171999830 + +++ .../src/test/regress/results/partition_split.out 2024-05-24 19:28:37.32749 + @@ -625,8 +625,8 @@ SELECT pg_get_constraintdef(oid), conname, conkey FROM pg_constraint WHERE conrelid = 'sales_feb_mar_apr2022'::regclass::oid; pg_get_constraintdef | conname | conkey -+-+ - CHECK ((sales_amount > 1)) | sales_range_sales_amount_check | {2} FOREIGN KEY (salesperson_id) REFERENCES salespeople(salesperson_id) | sales_range_salesperson_id_fkey | {1} + CHECK ((sales_amount > 1)) | sales_range_sales_amount_check | {2} (2 rows) ALTER TABLE sales_range SPLIT PARTITION sales_feb_mar_apr2022 INTO Best regards, Alexander
Improving tracking/processing of buildfarm test failures
Hello hackers, I'd like to discuss ways to improve the buildfarm experience for anyone who are interested in using information which buildfarm gives to us. Unless I'm missing something, as of now there are no means to determine whether some concrete failure is known/investigated or fixed, how frequently it occurs and so on... From my experience, it's not that unbelievable that some failure occurred two years ago and lost in time was an indication of e. g. a race condition still existing in the code/tests and thus worth fixing. But without classifying/marking failures it's hard to find such or other interesting failure among many others... The first way to improve things I can imagine is to add two fields to the buildfarm database: a link to the failure discussion (set when the failure is investigated/reproduced and reported in -bugs or -hackers) and a commit id/link (set when the failure is fixed). I understand that it requires modifying the buildfarm code, and adding some UI to update these fields, but it allows to add filters to see only unknown/non-investigated failures in the buildfarm web interface later. The second way is to create a wiki page, similar to "PostgreSQL 17 Open Items", say, "Known buildfarm test failures" and fill it like below: ... Useful info from the failure logs for reference ... --- This way is less invasive, but it would work well only if most of interested people know of it/use it. (I could start with the second approach, if you don't mind, and we'll see how it works.) Best regards, Alexande|r|
Re: Testing autovacuum wraparound (including failsafe)
Hello, 30.11.2023 10:35, Masahiko Sawada wrote: I've attached new version patches (0002 and 0003 are unchanged except for the commit message). I'll push them, barring any objections. Pushed. I've discovered that the test 001_emergency_vacuum.pl can fail due to a race condition. I can't see the server log at [1], but I reproduced the failure locally and with additional logging and log_min_messages = DEBUG3, the log shows: ... 2024-05-22 11:46:28.125 UTC [21256:2853] DEBUG: SlruScanDirectory invoking callback on pg_xact/0690 2024-05-22 11:46:28.125 UTC [21256:2854] DEBUG: transaction ID wrap limit is 2147484396, limited by database with OID 5 2024-05-22 11:46:28.126 UTC [21256:2855] LOG: !!!SendPostmasterSignal| PMSIGNAL_START_AUTOVAC_LAUNCHER 2024-05-22 11:46:28.135 UTC [14871:20077] DEBUG: postmaster received pmsignal signal 2024-05-22 11:46:28.137 UTC [21257:1] DEBUG: autovacuum launcher started 2024-05-22 11:46:28.137 UTC [21257:2] DEBUG: InitPostgres 2024-05-22 11:46:28.138 UTC [21257:3] LOG: !!!AutoVacLauncherMain| !AutoVacuumingActive() && !ShutdownRequestPending; before do_start_worker() 2024-05-22 11:46:28.138 UTC [21257:4] LOG: !!!do_start_worker| return quickly when there are no free workers 2024-05-22 11:46:28.138 UTC [21257:5] DEBUG: shmem_exit(0): 4 before_shmem_exit callbacks to make 2024-05-22 11:46:28.138 UTC [21257:6] DEBUG: shmem_exit(0): 6 on_shmem_exit callbacks to make 2024-05-22 11:46:28.138 UTC [21257:7] DEBUG: proc_exit(0): 1 callbacks to make 2024-05-22 11:46:28.138 UTC [21257:8] DEBUG: exit(0) 2024-05-22 11:46:28.138 UTC [21257:9] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make 2024-05-22 11:46:28.138 UTC [21257:10] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make 2024-05-22 11:46:28.138 UTC [21257:11] DEBUG: proc_exit(-1): 0 callbacks to make 2024-05-22 11:46:28.146 UTC [21256:2856] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 5 2024-05-22 11:46:28.146 UTC [21256:2857] DEBUG: MultiXact member stop limit is now 4294914944 based on MultiXact 1 2024-05-22 11:46:28.146 UTC [21256:2858] DEBUG: shmem_exit(0): 4 before_shmem_exit callbacks to make 2024-05-22 11:46:28.147 UTC [21256:2859] DEBUG: shmem_exit(0): 7 on_shmem_exit callbacks to make 2024-05-22 11:46:28.147 UTC [21256:2860] DEBUG: proc_exit(0): 1 callbacks to make 2024-05-22 11:46:28.147 UTC [21256:2861] DEBUG: exit(0) 2024-05-22 11:46:28.147 UTC [21256:2862] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make 2024-05-22 11:46:28.147 UTC [21256:2863] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make 2024-05-22 11:46:28.147 UTC [21256:2864] DEBUG: proc_exit(-1): 0 callbacks to make 2024-05-22 11:46:28.151 UTC [14871:20078] DEBUG: forked new backend, pid=21258 socket=8 2024-05-22 11:46:28.171 UTC [14871:20079] DEBUG: server process (PID 21256) exited with exit code 0 2024-05-22 11:46:28.152 UTC [21258:1] [unknown] LOG: connection received: host=[local] 2024-05-22 11:46:28.171 UTC [21258:2] [unknown] DEBUG: InitPostgres 2024-05-22 11:46:28.172 UTC [21258:3] [unknown] LOG: connection authenticated: user="vagrant" method=trust (/pgtest/postgresql.git/src/test/modules/xid_wraparound/tmp_check/t_001_emergency_vacuum_main_data/pgdata/pg_hba.conf:117) 2024-05-22 11:46:28.172 UTC [21258:4] [unknown] LOG: connection authorized: user=vagrant database=postgres application_name=001_emergency_vacuum.pl 2024-05-22 11:46:28.175 UTC [21258:5] 001_emergency_vacuum.pl LOG: statement: INSERT INTO small(data) SELECT 1 That is, autovacuum worker (21256) sent PMSIGNAL_START_AUTOVAC_LAUNCHER, postmaster started autovacuum launcher, which could not start new autovacuum worker due to the process 21256 not exited yet. The failure can be reproduced easily with the sleep added inside SetTransactionIdLimit(): if (TransactionIdFollowsOrEquals(curXid, xidVacLimit) && IsUnderPostmaster && !InRecovery) SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER); +pg_usleep(1L); By the way I also discovered that rather resource-intensive xid_wraparound tests executed twice during the buildfarm "make" run (on dodo and perentie (see [2]) animals), at stage module-xid_wraparound-check and then at stage testmodules-install-check-C. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo=2024-05-19%2006%3A33%3A34 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=perentie=2024-05-22%2000%3A02%3A19 Best regards, Alexander
Re: Cleaning up perl code
Hello Dagfinn, Thank you for paying attention to it and improving the possible fix! 20.05.2024 23:39, Dagfinn Ilmari Mannsåker wrote: Nice cleanup! Did you use some static analysis tool, or did look for them manually? I reviewed my collection of unica I gathered for several months, but had found some of them too minor/requiring more analysis. Then I added more with perlcritic's policy UnusedVariables, and also checked for unused subs with a script from blogs.perl.org (and it confirmed my only previous find of that kind). If I add [Variables::ProhibitUnusedVariables] to src/tools/perlcheck/perlcriticrc, it finds a few more, see the attached patch. Yes, I saw unused $sqlstates, but decided that they are meaningful enough to stay. Though maybe enabling ProhibitUnusedVariables justifies fixing them too. The scripts parsing errcodes.txt really should be refactored into using a common module, but that's a patch for another day. Agree, and I would leave 005_negotiate_encryption.pl (with $node_conf, $server_config unused since d39a49c1e) aside for another day too. Best regards, Alexander
Cleaning up perl code
Hello hackers, Please look at a bunch of unused variables and a couple of other defects I found in the perl code, maybe you'll find them worth fixing: contrib/amcheck/t/001_verify_heapam.pl $result # unused since introduction in 866e24d47 unused sub: get_toast_for # not used since 860593ec3 contrib/amcheck/t/002_cic.pl $result # orphaned since 7f580aa5d src/backend/utils/activity/generate-wait_event_types.pl $note, $note_name # unused since introduction in fa8892847 src/bin/pg_dump/t/003_pg_dump_with_server.pl $cmd, $stdout, $stderr, $result # unused since introduction in 2f9eb3132 src/bin/pg_dump/t/005_pg_dump_filterfile.pl $cmd, $stdout, $stderr, $result # unused since introduciton in a5cf808be src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl $slapd, $ldap_schema_dir # unused since introduction in 419a8dd81 src/test/modules/ssl_passphrase_callback/t/001_testfunc.pl $clearpass # orphaned since b846091fd src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm $ostmt # unused since introduction in 47bb9db75 src/test/recovery/t/021_row_visibility.pl $ret # unused since introduction in 7b28913bc src/test/recovery/t/032_relfilenode_reuse.pl $ret # unused since introduction in e2f65f425 src/test/recovery/t/035_standby_logical_decoding.pl $stdin, $ret, $slot # unused since introduction in fcd77d532 $subscriber_stdin, $subscriber_stdout, $subscriber_stderr # unused since introduction in 376dc8205 src/test/subscription/t/024_add_drop_pub.pl invalid reference in a comment: tab_drop_refresh -> tab_2 # introduced with 1046a69b3 (invalid since v6-0001-...patch in the commit's thread) src/tools/msvc_gendef.pl @def # unused since introduction in 933b46644? I've attached a patch with all of these changes (tested with meson build on Windows and check-world on Linux). Best regards, Alexanderdiff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl index 9de3148277..481e4dbe4f 100644 --- a/contrib/amcheck/t/001_verify_heapam.pl +++ b/contrib/amcheck/t/001_verify_heapam.pl @@ -9,7 +9,7 @@ use PostgreSQL::Test::Utils; use Test::More; -my ($node, $result); +my $node; # # Test set-up @@ -87,19 +87,6 @@ sub relation_filepath return "$pgdata/$rel"; } -# Returns the fully qualified name of the toast table for the named relation -sub get_toast_for -{ - my ($relname) = @_; - - return $node->safe_psql( - 'postgres', qq( - SELECT 'pg_toast.' || t.relname - FROM pg_catalog.pg_class c, pg_catalog.pg_class t - WHERE c.relname = '$relname' - AND c.reltoastrelid = t.oid)); -} - # (Re)create and populate a test table of the given name. sub fresh_test_table { diff --git a/contrib/amcheck/t/002_cic.pl b/contrib/amcheck/t/002_cic.pl index 53a3db9745..0207407ca0 100644 --- a/contrib/amcheck/t/002_cic.pl +++ b/contrib/amcheck/t/002_cic.pl @@ -10,7 +10,7 @@ use PostgreSQL::Test::Utils; use Test::More; -my ($node, $result); +my $node; # # Test set-up diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl index 42f36f405b..b2d61bd8ba 100644 --- a/src/backend/utils/activity/generate-wait_event_types.pl +++ b/src/backend/utils/activity/generate-wait_event_types.pl @@ -42,8 +42,6 @@ my @abi_compatibility_lines; my @lines; my $abi_compatibility = 0; my $section_name; -my $note; -my $note_name; # Remove comments and empty lines and add waitclassname based on the section while (<$wait_event_names>) diff --git a/src/bin/pg_dump/t/003_pg_dump_with_server.pl b/src/bin/pg_dump/t/003_pg_dump_with_server.pl index b5a1445550..3f549f44e7 100644 --- a/src/bin/pg_dump/t/003_pg_dump_with_server.pl +++ b/src/bin/pg_dump/t/003_pg_dump_with_server.pl @@ -26,7 +26,6 @@ $node->safe_psql('postgres', "CREATE SERVER s1 FOREIGN DATA WRAPPER dummy"); $node->safe_psql('postgres', "CREATE SERVER s2 FOREIGN DATA WRAPPER dummy"); $node->safe_psql('postgres', "CREATE FOREIGN TABLE t0 (a int) SERVER s0"); $node->safe_psql('postgres', "CREATE FOREIGN TABLE t1 (a int) SERVER s1"); -my ($cmd, $stdout, $stderr, $result); command_fails_like( [ "pg_dump", '-p', $port, '--include-foreign-data=s0', 'postgres' ], diff --git a/src/bin/pg_dump/t/005_pg_dump_filterfile.pl b/src/bin/pg_dump/t/005_pg_dump_filterfile.pl index a80e13a0d3..6025bb296c 100644 --- a/src/bin/pg_dump/t/005_pg_dump_filterfile.pl +++ b/src/bin/pg_dump/t/005_pg_dump_filterfile.pl @@ -81,7 +81,6 @@ $node->safe_psql('sourcedb', # # Test interaction of correctly specified filter file # -my ($cmd, $stdout, $stderr, $result); # Empty filterfile open $inputfile, '>', "$tempdir/inputfile.txt" diff --git a/src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl b/src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl index b990e7d101..82b1cb88e9 100644 --- a/src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl +++ b/src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl @@ -12,10 +12,6 @@ use Test::More;
Re: Avoid orphaned objects dependencies, take 3
Hello Bertrand, 15.05.2024 11:31, Bertrand Drouvot wrote: On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote: +if (object_description) +ereport(ERROR, errmsg("%s does not exist", object_description)); +else +ereport(ERROR, errmsg("a dependent object does not ex This generates an internal error code. Is that intended? Yes, it's like when say you want to create an object in a schema that does not exist (see get_namespace_oid()). AFAICS, get_namespace_oid() throws not ERRCODE_INTERNAL_ERROR, but ERRCODE_UNDEFINED_SCHEMA: # SELECT regtype('unknown_schema.type'); ERROR: schema "unknown_schema" does not exist LINE 1: SELECT regtype('unknown_schema.type'); ^ # \echo :LAST_ERROR_SQLSTATE 3F000 Probably, it's worth to avoid ERRCODE_INTERNAL_ERROR here in light of [1] and [2], as these errors are not that abnormal (not Assert-like). [1] https://www.postgresql.org/message-id/Zic_GNgos5sMxKoa%40paquier.xyz [2] https://commitfest.postgresql.org/48/4735/ Best regards, Alexander
Re: Underscore in positional parameters?
Hello Erik, 18.05.2024 04:31, Erik Wienhold wrote: On 2024-05-17 02:06 +0200, Michael Paquier wrote: On Thu, May 16, 2024 at 08:41:11AM +0200, Peter Eisentraut wrote: On this specific patch, maybe reword "parameter too large" to "parameter number too large". WFM here. Done in v3. Thank you for working on this! I encountered anomalies that you address with this patch too. And I can confirm that it fixes most cases, but there is another one: SELECT $3 \bind 'foo' \g ERROR: invalid memory alloc request size 12 Maybe you would find this worth fixing as well. Best regards, Alexander
Re: Streaming read-ready sequential scan code
Hello Thomas, 18.05.2024 07:47, Thomas Munro wrote: After more debugging, we learned a lot more things... 1. That query produces spectacularly bad estimates, so we finish up having to increase the number of buckets in a parallel hash join many times. That is quite interesting, but unrelated to new code. 2. Parallel hash join is quite slow at negotiating an increase in the number of hash bucket, if all of the input tuples are being filtered out by quals, because of the choice of where workers check for PHJ_GROWTH_NEED_MORE_BUCKETS. That could be improved quite easily I think. I have put that on my todo list 'cause that's also my code, but it's not a new issue it's just one that is now highlighted... 3. This bit of read_stream.c is exacerbating unfairness in the underlying scan, so that 1 and 2 come together and produce a nasty slowdown, which goes away if you change it like so: - BlockNumber blocknums[16]; + BlockNumber blocknums[1]; I will follow up after some more study. Thank you for the information! Unfortunately, I can't see significant differences in my environment with parallel_leader_participation=off. With blocknums[1], timing is changed, but the effect is not persistent. 10 query15 executions in a row, b7b0f3f27: 277.932 ms 281.805 ms 278.335 ms 281.565 ms 284.167 ms 283.171 ms 281.165 ms 281.615 ms 285.394 ms 277.301 ms b7b0f3f27~1: 159.789 ms 165.407 ms 160.893 ms 159.343 ms 160.936 ms 161.577 ms 161.637 ms 163.421 ms 163.143 ms 167.109 ms b7b0f3f27 + blocknums[1]: 164.133 ms 280.920 ms 160.748 ms 163.182 ms 161.709 ms 161.998 ms 161.239 ms 276.256 ms 161.601 ms 160.384 ms I placed PGDATA on tmpfs to rule out any blockdev specifics (increasing blockdev ra from 256 to 4096 didn't help me with PGDATA on NVME either.) Best regards, Alexander
Re: Streaming read-ready sequential scan code
Hello, I decided to compare v17 vs v16 performance (as I did the last year [1]) and discovered that v17 loses to v16 in the pg_tpcds (s64da_tpcds) benchmark, query15 (and several others, but I focused on this one): Best pg-src-master--.* worse than pg-src-16--.* by 52.2 percents (229.84 > 151.03): pg_tpcds.query15 Average pg-src-master--.* worse than pg-src-16--.* by 53.4 percents (234.20 > 152.64): pg_tpcds.query15 Please look at the full html report attached in case you're interested. (I used my pg-mark tool to measure/analyze performance, but I believe the same results can be seen without it.) `git bisect` for this performance degradation pointed at b7b0f3f27... [1] https://www.postgresql.org/message-id/b32bed1b-0746-9b20-1472-4bdc9ca66d52%40gmail.com Best regards, AlexanderTitle: Postgres benchmarking results Benchmarking run started at 2024-05-14T12:04:26, ended at 2024-05-15T05:44:15Benchmark versionInstanceversionpg-src-16--1PostgreSQL 16.3-REL_16_STABLE/c1664c8eef on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitpg-src-master--1PostgreSQL 17devel-master/2e810bdb7f on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitpg-src-16--2PostgreSQL 16.3-REL_16_STABLE/c1664c8eef on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitpg-src-master--2PostgreSQL 17devel-master/2e810bdb7f on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitpg-src-master--3PostgreSQL 17devel-master/2e810bdb7f on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitpg-src-16--3PostgreSQL 16.3-REL_16_STABLE/c1664c8eef on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitBenchmark pgbench_nativeInstancetpspg-src-16--110478.49pg-src-master--110662.36pg-src-16--210491.77pg-src-master--210700.65pg-src-master--310565.92pg-src-16--310555.88Benchmark pgbench_referenceInstancelatency_avgtps1tps2pg-src-16--18.597452.807452.84pg-src-master--18.427597.857597.88pg-src-16--28.527508.757508.79pg-src-master--28.487545.737545.77pg-src-master--38.377649.517649.55pg-src-16--38.547498.927498.95Benchmark pg_tpchInstancequery1query2query3query4query5query6query7query8query9query10query11query12query13query14query15query16query17query18query19query20query21query22pg-src-16--19.591.473.000.700.901.472.974.560.942.470.772.232.870.883.200.978.7014.410.195.842.360.23pg-src-master--19.111.542.880.700.891.402.834.750.872.390.752.132.830.893.061.008.3213.970.195.552.300.22pg-src-16--29.611.493.040.710.931.472.964.690.862.500.782.212.880.883.201.008.8214.200.205.852.360.23pg-src-master--29.051.492.850.700.871.372.824.470.872.350.772.142.670.883.030.998.2714.240.195.572.300.22pg-src-master--310.121.482.880.690.881.392.844.730.882.400.782.142.800.883.081.008.2814.180.195.562.330.22pg-src-16--39.721.473.010.700.911.462.954.670.962.500.782.202.710.873.221.008.6914.400.195.982.340.23Benchmark
Re: Why is citext/regress failing on hamerkop?
Hello Thomas, 16.05.2024 04:32, Thomas Munro wrote: On Thu, May 16, 2024 at 10:43 AM Thomas Munro wrote: Any chance you could test this version please Alexander? Sorry, cancel that. v3 is not good. I assume it fixes the GSSAPI thing and is superficially better, but it doesn't handle code that calls twice in a row and ignores the first result (I know that PostgreSQL does that occasionally in a few places), and it's also broken if someone gets recv() = 0 (EOF), and then decides to wait anyway. The only ways I can think of to get full reliable poll()-like semantics is to do that peek every time, OR the complicated patch (per-socket-workspace + intercepting recv etc). So I'm back to v2. I've tested v2 and can confirm that it works as v1, `vcregress check` passes with no failures on REL_16_STABLE, `meson test` with the basic configuration too. By the way, hamerkop is not configured to enable gssapi for HEAD [1] and I could not enable gss locally yet (just passing extra_lib_dirs, extra_include_dirs doesn't work for me). It looks like we need to find a way to enable it for meson to continue testing v17+ with GSS on Windows. [1] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hamerkop=2024-05-12%2011%3A00%3A28=configure Best regards, Alexander
Re: Why is citext/regress failing on hamerkop?
15.05.2024 01:26, Thomas Munro wrote: OK, so we know what the problem is here. Here is the simplest solution I know of for that problem. I have proposed this in the past and received negative feedback because it's a really gross hack. But I don't personally know what else to do about the back-branches (or even if that complex solution is the right way forward for master). The attached kludge at least has the [de]merit of being a mirror image of the kludge that follows it for the "opposite" event. Does this fix it? Yes, I see that abandoned GSS connections are closed immediately, as expected. I have also confirmed that `meson test` with the basic configuration passes on REL_16_STABLE. So from the outside, the fix looks good to me. Thank you for working on this! Best regards, Alexander
Re: Why is citext/regress failing on hamerkop?
14.05.2024 17:38, Tom Lane wrote: As I mentioned in our off-list discussion, I have a lingering feeling that this v14 commit could be affecting the results somehow: Author: Tom Lane Branch: master Release: REL_14_BR [d5a9a661f] 2020-10-18 12:56:43 -0400 Update the Winsock API version requested by libpq. According to Microsoft's documentation, 2.2 has been the current version since Windows 98 or so. Moreover, that's what the Postgres backend has been requesting since 2004 (cf commit 4cdf51e64). So there seems no reason for libpq to keep asking for 1.1. I didn't believe at the time that that'd have any noticeable effect, but maybe it somehow made Winsock play a bit nicer with the GSS support? Yes, probably, but may be not nicer, as the test duration increased? Still I can't see the difference locally to check that commit. Will try other VMs/configurations, maybe I could find a missing factor... Best regards, Alexander
Re: Avoid orphaned objects dependencies, take 3
Hi Bertrand, 09.05.2024 15:20, Bertrand Drouvot wrote: Oh I see, your test updates an existing dependency. v4 took care about brand new dependencies creation (recordMultipleDependencies()) but forgot to take care about changing an existing dependency (which is done in another code path: changeDependencyFor()). Please find attached v5 that adds: - a call to the new depLockAndCheckObject() function in changeDependencyFor(). - a test when altering an existing dependency. With v5 applied, I don't see the issue anymore. Me too. Thank you for the improved version! I will test the patch in the background, but for now I see no other issues with it. Best regards, Alexander
Re: Why is citext/regress failing on hamerkop?
13.05.2024 23:17, Tom Lane wrote: 3. We don't know exactly why hamerkop suddenly started seeing these failures, but a plausible theory emerges after noting that its reported time for the successful "make check" step dropped pretty substantially right when this started. In the v13 branch, "make check" was taking 2:18 or more in the several runs right before the first isolationcheck failure, but 1:40 or less just after. So it looks like the animal was moved onto faster hardware. That feeds into this problem because, with a successful isolationcheck run taking more than a minute, there was enough time for some of the earlier stuck sessions to time out and exit before their postmaster-child slots were needed. Yes, and one thing I can't explain yet, is why REL_14_STABLE+ timings substantially differ from REL_13_STABLE-, say, for the check stage: REL_14_STABLE: the oldest available test log (from 2021-10-30) shows check (00:03:47) and the newest one (from 2024-05-12): check (00:03:18). Whilst on REL_13_STABLE the oldest available log (from 2021-08-06) shows check duration 00:03:00, then it decreased to 00:02:24 (2021-09-22)/ 00:02:14 (2021-11-07), and now it's 1:40, as you said. Locally I see more or less the same timings on REL_13_STABLE (34, 28, 27 secs) and on REL_14_STABLE (33, 29, 29 secs). 14.05.2024 03:38, Thomas Munro wrote: I was beginning to suspect that lingering odour myself. I haven't look at the GSS code but I was imagining that what we have here is perhaps not unsent data dropped on the floor due to linger policy (unclean socket close on process exist), but rather that the server didn't see the socket as ready to read because it lost track of the FD_CLOSE somewhere because the client closed it gracefully, and our server-side FD_CLOSE handling has always been a bit suspect. I wonder if the GSS code is somehow more prone to brokenness. One thing we learned in earlier problems was that abortive/error disconnections generate FD_CLOSE repeatedly, while graceful ones give you only one. In other words, if the other end politely calls closesocket(), the server had better not miss the FD_CLOSE event, because it won't come again. That's what https://commitfest.postgresql.org/46/3523/ is intended to fix. Does it help here? Unfortunately that's unpleasantly complicated and unbackpatchable (keeping a side-table of socket FDs and event handles, so we don't lose events between the cracks). Yes, that cure helps here too. I've tested it on b282fa88d~1 (the last state when that patch set can be applied). An excerpt (all lines related to process 12500) from a failed run log without the patch set: 2024-05-14 05:57:29.526 UTC [8228:128] DEBUG: forked new backend, pid=12500 socket=5524 2024-05-14 05:57:29.534 UTC [12500:1] [unknown] LOG: connection received: host=::1 port=51394 2024-05-14 05:57:29.534 UTC [12500:2] [unknown] LOG: !!!BackendInitialize| before ProcessStartupPacket 2024-05-14 05:57:29.534 UTC [12500:3] [unknown] LOG: !!!ProcessStartupPacket| before secure_open_gssapi(), GSSok: G 2024-05-14 05:57:29.534 UTC [12500:4] [unknown] LOG: !!!secure_open_gssapi| before read_or_wait 2024-05-14 05:57:29.534 UTC [12500:5] [unknown] LOG: !!!read_or_wait| before secure_raw_read(); PqGSSRecvLength: 0, len: 4 2024-05-14 05:57:29.534 UTC [12500:6] [unknown] LOG: !!!read_or_wait| after secure_raw_read: -1, errno: 10035 2024-05-14 05:57:29.534 UTC [12500:7] [unknown] LOG: !!!read_or_wait| before WaitLatchOrSocket() 2024-05-14 05:57:29.549 UTC [12500:8] [unknown] LOG: !!!read_or_wait| after WaitLatchOrSocket 2024-05-14 05:57:29.549 UTC [12500:9] [unknown] LOG: !!!read_or_wait| before secure_raw_read(); PqGSSRecvLength: 0, len: 4 2024-05-14 05:57:29.549 UTC [12500:10] [unknown] LOG: !!!read_or_wait| after secure_raw_read: 0, errno: 10035 2024-05-14 05:57:29.549 UTC [12500:11] [unknown] LOG: !!!read_or_wait| before WaitLatchOrSocket() ... 2024-05-14 05:57:52.024 UTC [8228:3678] DEBUG: server process (PID 12500) exited with exit code 1 # at the end of the test run And an excerpt (all lines related to process 11736) from a successful run log with the patch set applied: 2024-05-14 06:03:57.216 UTC [4524:130] DEBUG: forked new backend, pid=11736 socket=5540 2024-05-14 06:03:57.226 UTC [11736:1] [unknown] LOG: connection received: host=::1 port=51914 2024-05-14 06:03:57.226 UTC [11736:2] [unknown] LOG: !!!BackendInitialize| before ProcessStartupPacket 2024-05-14 06:03:57.226 UTC [11736:3] [unknown] LOG: !!!ProcessStartupPacket| before secure_open_gssapi(), GSSok: G 2024-05-14 06:03:57.226 UTC [11736:4] [unknown] LOG: !!!secure_open_gssapi| before read_or_wait 2024-05-14 06:03:57.226 UTC [11736:5] [unknown] LOG: !!!read_or_wait| before secure_raw_read(); PqGSSRecvLength: 0, len: 4 2024-05-14 06:03:57.226 UTC [11736:6] [unknown] LOG: !!!read_or_wait| after secure_raw_read: -1, errno: 10035 2024-05-14 06:03:57.226 UTC [11736:7] [unknown] LOG: !!!read_or_wait| before WaitLatchOrSocket()
Re: Why is citext/regress failing on hamerkop?
Hello Tom, 12.05.2024 08:34, Tom Lane wrote: BTW, I've also been wondering why hamerkop has been failing isolation-check in the 12 and 13 branches for the last six months or so. It is surely unrelated to this issue, and it looks like it must be due to some platform change rather than anything we committed at the time. I'm not planning on looking into that question myself, but really somebody ought to. Or is Windows just as dead as AIX, in terms of anybody being willing to put effort into supporting it? I've reproduced the failure locally with GSS enabled, so I'll try to figure out what's going on here in the next few days. Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hello Dmitry and Alexander, Please look at one more anomaly with temporary tables: CREATE TEMP TABLE t (a int) PARTITION BY RANGE (a); CREATE TEMP TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1) ; CREATE TEMP TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2); ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0; -- succeeds, but: ALTER TABLE t SPLIT PARTITION tp_0 INTO (PARTITION tp_0 FOR VALUES FROM (0) TO (1), PARTITION tp_1 FOR VALUES FROM (1) TO (2)); -- fails with: ERROR: relation "tp_0" already exists Though the same SPLIT succeeds with non-temporary tables... Best regards, Alexander
Re: WAL record CRC calculated incorrectly because of underlying buffer modification
11.05.2024 07:25, Thomas Munro wrote: On Sat, May 11, 2024 at 4:00 PM Alexander Lakhin wrote: 11.05.2024 06:26, Thomas Munro wrote: Perhaps a no-image, no-change registered buffer should not be including an image, even for XLR_CHECK_CONSISTENCY? It's actually useless for consistency checking too I guess, this issue aside, because it doesn't change anything so there is nothing to check. Yes, I think something wrong is here. I've reduced the reproducer to: Does it reproduce if you do this? - include_image = needs_backup || (info & XLR_CHECK_CONSISTENCY) != 0; + include_image = needs_backup || + ((info & XLR_CHECK_CONSISTENCY) != 0 && +(regbuf->flags & REGBUF_NO_CHANGE) == 0); No, it doesn't (at least with the latter, more targeted reproducer). Unfortunately the back branches don't have that new flag from 00d7fb5e so, even if this is the right direction (not sure, I don't understand this clean registered buffer trick) then ... but wait, why are there are no failures like this in the back branches (yet at least)? Does your reproducer work for 16? I wonder if something relevant changed recently, like f56a9def. CC'ing Michael and Amit K for info. Maybe it's hard to hit (autovacuum needs to process the index page in a narrow time frame), but locally I could reproduce the issue even on ac27c74de(~1 too) from 2018-09-06 (I tried several last commits touching hash indexes, didn't dig deeper). Best regards, Alexander
Re: WAL record CRC calculated incorrectly because of underlying buffer modification
Hello Thomas and Andres, Thank you for looking at this! 11.05.2024 06:26, Thomas Munro wrote: On Sat, May 11, 2024 at 3:57 AM Andres Freund wrote: On 2024-05-10 16:00:01 +0300, Alexander Lakhin wrote: and discovered that XLogRecordAssemble() calculates CRC over a buffer, that might be modified by another process. If, with "might", you mean that it's legitimate for that buffer to be modified, I don't think so. The bug is that something is modifying the buffer despite it being exclusively locked. I.e. what we likely have here is a bug somewhere in the hash index code. I don't have a good grip on the higher level locking protocols of hash.c, but one microscopic thing jumps out: /* * bucket buffer was not changed, but still needs to be * registered to ensure that we can acquire a cleanup lock on * it during replay. */ if (!xlrec.is_primary_bucket_page) { uint8flags = REGBUF_STANDARD | REGBUF_NO_IMAGE | REGBUF_NO_CHANGE; XLogRegisterBuffer(0, bucket_buf, flags); } That registers a buffer that is pinned but not content-locked, and we tell xloginsert.c not to copy its image into the WAL, but it does it anyway because: /* * If needs_backup is true or WAL checking is enabled for current * resource manager, log a full-page write for the current block. */ include_image = needs_backup || (info & XLR_CHECK_CONSISTENCY) != 0; So I guess it copies the image on dodo, which has: 'PG_TEST_EXTRA' => 'ssl ldap kerberos wal_consistency_checking libpq_encryption xid_wraparound' Perhaps a no-image, no-change registered buffer should not be including an image, even for XLR_CHECK_CONSISTENCY? It's actually useless for consistency checking too I guess, this issue aside, because it doesn't change anything so there is nothing to check. Yes, I think something wrong is here. I've reduced the reproducer to: cat << 'EOF' | psql CREATE TABLE hash_cleanup_heap(keycol INT); CREATE INDEX hash_cleanup_index on hash_cleanup_heap USING HASH (keycol); BEGIN; INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 500) as i; ROLLBACK; EOF cat << 'EOF' | psql & INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 500) as i; DROP TABLE hash_cleanup_heap; EOF cat << 'EOF' | psql & SELECT pg_sleep(random() / 20); VACUUM hash_cleanup_heap; EOF wait grep 'TRAP:' server.log ("wal_consistency_checking = all" and the xloginsert patch are still required) and with additional logging I see: 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|LOG: !!!hashbucketcleanup| scan page buf: 1832 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|CONTEXT: while vacuuming index "hash_cleanup_index" of relation "public.hash_cleanup_heap" 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|STATEMENT: VACUUM hash_cleanup_heap; 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96de|LOG: !!!_hash_doinsert| _hash_getbucketbuf_from_hashkey: 1822 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96de|STATEMENT: INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 500) as i; 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|LOG: !!!hashbucketcleanup| xlrec.is_primary_bucket_page: 0, buf: 1832, bucket_buf: 1822 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|CONTEXT: while vacuuming index "hash_cleanup_index" of relation "public.hash_cleanup_heap" 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|STATEMENT: VACUUM hash_cleanup_heap; 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96de|LOG: !!!_hash_doinsert| _hash_relbuf(rel, 1822) 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96de|STATEMENT: INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 500) as i; TRAP: failed Assert("memcmp(block1_ptr, block1_copy, block1_len) == 0"), File: "xloginsert.c", Line: 949, PID: 2070237 Best regards, Alexander
WAL record CRC calculated incorrectly because of underlying buffer modification
Hello hackers, I've investigated a recent buildfarm failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo=2024-05-02%2006%3A40%3A36 where the test failed due to a CRC error: 2024-05-02 17:08:18.401 ACST [3406:7] LOG: incorrect resource manager data checksum in record at 0/F14D7A60 (Chipmunk produced similar errors as well: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk=2022-08-25%2019%3A40%3A11 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk=2024-03-22%2003%3A20%3A39 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk=2023-08-19%2006%3A43%3A20 ) and discovered that XLogRecordAssemble() calculates CRC over a buffer, that might be modified by another process. With the attached patch applied, the following test run: echo " autovacuum_naptime = 1 autovacuum_vacuum_threshold = 1 wal_consistency_checking = all " >/tmp/temp.config for ((i=1;i<=100;i++)); do echo "iteration $i"; TEMP_CONFIG=/tmp/temp.config TESTS="test_setup hash_index" make check-tests -s || break; done fails for me on iterations 7, 10, 17: ok 1 - test_setup 2557 ms not ok 2 - hash_index 24719 ms # (test process exited with exit code 2) postmaster.log contains: 2024-05-10 12:46:44.320 UTC checkpointer[1881151] LOG: checkpoint starting: immediate force wait 2024-05-10 12:46:44.365 UTC checkpointer[1881151] LOG: checkpoint complete: wrote 41 buffers (0.3%); 0 WAL file(s) added, 0 removed, 26 recycled; write=0.001 s, sync=0.001 s, total=0.046 s; sync files=0, longest=0.000 s, average=0.000 s; distance=439134 kB, estimate=527137 kB; lsn=0/3CE131F0, redo lsn=0/3CE13198 TRAP: failed Assert("memcmp(block1_ptr, block1_copy, block1_len) == 0"), File: "xloginsert.c", Line: 949, PID: 1881271 ExceptionalCondition at assert.c:52:13 XLogRecordAssemble at xloginsert.c:953:1 XLogInsert at xloginsert.c:520:9 hashbucketcleanup at hash.c:844:14 hashbulkdelete at hash.c:558:3 index_bulk_delete at indexam.c:760:1 vac_bulkdel_one_index at vacuum.c:2498:10 lazy_vacuum_one_index at vacuumlazy.c:2443:10 lazy_vacuum_all_indexes at vacuumlazy.c:2026:26 lazy_vacuum at vacuumlazy.c:1944:10 lazy_scan_heap at vacuumlazy.c:1050:3 heap_vacuum_rel at vacuumlazy.c:503:2 vacuum_rel at vacuum.c:2214:2 vacuum at vacuum.c:622:8 autovacuum_do_vac_analyze at autovacuum.c:3102:2 do_autovacuum at autovacuum.c:2425:23 AutoVacWorkerMain at autovacuum.c:1569:3 pgarch_die at pgarch.c:846:1 StartChildProcess at postmaster.c:3929:5 StartAutovacuumWorker at postmaster.c:3997:12 process_pm_pmsignal at postmaster.c:3809:3 ServerLoop at postmaster.c:1667:5 BackgroundWorkerInitializeConnection at postmaster.c:4156:1 main at main.c:184:3 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fc71a8d7e40] postgres: autovacuum worker regression(_start+0x25)[0x556a8631a5e5] 2024-05-10 12:46:45.038 UTC checkpointer[1881151] LOG: checkpoint starting: immediate force wait 2024-05-10 12:46:45.965 UTC autovacuum worker[1881275] LOG: automatic analyze of table "regression.pg_catalog.pg_attribute" avg read rate: 0.000 MB/s, avg write rate: 5.409 MB/s buffer usage: 1094 hits, 0 misses, 27 dirtied system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.03 s 2024-05-10 12:46:46.892 UTC postmaster[1881150] LOG: server process (PID 1881271) was terminated by signal 6: Aborted 2024-05-10 12:46:46.892 UTC postmaster[1881150] DETAIL: Failed process was running: autovacuum: VACUUM ANALYZE public.hash_cleanup_heap (This can be reproduced with 027_stream_regress, of course, but it would take more time.) Best regards, Alexanderdiff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index 9047601534..aeb462c137 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -902,8 +902,21 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, */ INIT_CRC32C(rdata_crc); COMP_CRC32C(rdata_crc, hdr_scratch + SizeOfXLogRecord, hdr_rdt.len - SizeOfXLogRecord); + +char block1_copy[BLCKSZ]; +char *block1_ptr = NULL; +int block1_len = 0; + for (rdt = hdr_rdt.next; rdt != NULL; rdt = rdt->next) +{ + if (!block1_ptr) + { + memcpy(block1_copy, rdt->data, rdt->len); + block1_ptr = rdt->data; + block1_len = rdt->len; + } COMP_CRC32C(rdata_crc, rdt->data, rdt->len); +} /* * Ensure that the XLogRecord is not too large. @@ -930,6 +943,12 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, rechdr->xl_prev = InvalidXLogRecPtr; rechdr->xl_crc = rdata_crc; +if (block1_ptr) +{ +pg_usleep(100); +Assert(memcmp(block1_ptr, block1_copy, block1_len) == 0); +} + return _rdt; }
Non-systematic handling of EINTR/EAGAIN/EWOULDBLOCK
Hello hackers, Looking at a recent failure on the buildfarm: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork=2024-04-30%2020%3A48%3A34 # poll_query_until timed out executing this query: # SELECT archived_count FROM pg_stat_archiver # expecting this output: # 1 # last actual query output: # 0 # with stderr: # Looks like your test exited with 29 just after 4. [23:01:41] t/020_archive_status.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00) Failed 12/16 subtests with the following error in the log: 2024-04-30 22:57:27.931 CEST [83115:1] LOG: archive command failed with exit code 1 2024-04-30 22:57:27.931 CEST [83115:2] DETAIL: The failed archive command was: cp "pg_wal/00010001_does_not_exist" "00010001_does_not_exist" ... 2024-04-30 22:57:28.070 CEST [47962:2] [unknown] LOG: connection authorized: user=pgbf database=postgres application_name=020_archive_status.pl 2024-04-30 22:57:28.072 CEST [47962:3] 020_archive_status.pl LOG: statement: SELECT archived_count FROM pg_stat_archiver 2024-04-30 22:57:28.073 CEST [83115:3] LOG: could not send to statistics collector: Resource temporarily unavailable and the corresponding code (on REL_13_STABLE): static void pgstat_send(void *msg, int len) { int rc; if (pgStatSock == PGINVALID_SOCKET) return; ((PgStat_MsgHdr *) msg)->m_size = len; /* We'll retry after EINTR, but ignore all other failures */ do { rc = send(pgStatSock, msg, len, 0); } while (rc < 0 && errno == EINTR); #ifdef USE_ASSERT_CHECKING /* In debug builds, log send failures ... */ if (rc < 0) elog(LOG, "could not send to statistics collector: %m"); #endif } I wonder, whether this retry should be performed after EAGAIN (Resource temporarily unavailable), EWOULDBLOCK as well. With a simple send() wrapper (PFA) activated with LD_PRELOAD, I could reproduce this failure easily when running `make -s check -C src/test/recovery/ PROVE_TESTS="t/020*"` on REL_13_STABLE: t/020_archive_status.pl .. 1/16 # poll_query_until timed out executing this query: # SELECT archived_count FROM pg_stat_archiver # expecting this output: # 1 # last actual query output: # 0 # with stderr: # Looks like your test exited with 29 just after 4. t/020_archive_status.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00) Failed 12/16 subtests I also reproduced another failure (that lacks useful diagnostics, unfortunately): https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork=2022-11-10%2015%3A30%3A16 ... t/020_archive_status.pl .. 8/16 # poll_query_until timed out executing this query: # SELECT last_archived_wal FROM pg_stat_archiver # expecting this output: # 00010002 # last actual query output: # 00010001 # with stderr: # Looks like your test exited with 29 just after 13. t/020_archive_status.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00) Failed 3/16 subtests ... The "n == 64" condition in the cranky send() is needed to aim exactly these failures. Without this restriction the test (and also `make check`) just hangs because of: if (errno == EINTR) continue; /* Ok if we were interrupted */ /* * Ok if no data writable without blocking, and the socket is in * non-blocking mode. */ if (errno == EAGAIN || errno == EWOULDBLOCK) { return 0; } in internal_flush_buffer(). On the other hand, even with: int send(int s, const void *buf, size_t n, int flags) { if (rand() % 1 == 0) { errno = EINTR; return -1; } return real_send(s, buf, n, flags); } `make check` fails with many miscellaneous errors... Best regards, Alexander#define _GNU_SOURCE #include #include #include #include #include static ssize_t (*real_send)(int s, const void *buf, size_t n, int flags) = NULL; __attribute__((constructor)) void lib_init(void) { real_send = dlsym(RTLD_NEXT,"send"); } int send(int s, const void *buf, size_t n, int flags) { if (n == 64 && rand() % 10 == 0) { errno = EAGAIN; return -1; } return real_send(s, buf, n, flags); }
Test equivclass interferes with tests tsearch and advisory_lock
Hello hackers, While trying to catch a sporadic regression test failure, I've discovered that tests tsearch and advisory_lock in the parallel_schedule's group: test: select_views ... tsearch ... advisory_lock indirect_toast equivclass might fail, depending on timing, because the test equivclass creates and drops int alias operators. With the attached patch applied to make tests in question repeatable, the following test run fails for me: (printf 'test: test_setup\n'; \ printf 'test: tsearch equivclass %.0s\n' `seq 300`}) >/tmp/test_schedule; make -s check-tests TESTS="--schedule=/tmp/test_schedule" ... # 17 of 601 tests failed. regression.diffs contains: diff -U3 .../src/test/regress/expected/tsearch.out .../src/test/regress/results/tsearch.out --- .../src/test/regress/expected/tsearch.out 2024-05-06 08:11:54.892649407 + +++ .../src/test/regress/results/tsearch.out 2024-05-06 08:13:35.514113420 + @@ -16,10 +16,7 @@ WHERE prsnamespace = 0 OR prsstart = 0 OR prstoken = 0 OR prsend = 0 OR -- prsheadline is optional prslextype = 0; - oid | prsname --+- -(0 rows) - +ERROR: cache lookup failed for type 18517 SELECT oid, dictname ... Or with advisory_lock: (printf 'test: test_setup\n'; \ printf 'test: advisory_lock equivclass %.0s\n' `seq 300`}) >/tmp/test_schedule; make -s check-tests TESTS="--schedule=/tmp/test_schedule" ... # 1 of 601 tests failed. regression.diffs contains: diff -U3 .../src/test/regress/expected/advisory_lock.out .../src/test/regress/results/advisory_lock.out --- .../src/test/regress/expected/advisory_lock.out 2024-04-10 14:36:57.709586678 + +++ .../src/test/regress/results/advisory_lock.out 2024-05-06 08:15:09.235456794 + @@ -14,40 +14,17 @@ SELECT locktype, classid, objid, objsubid, mode, granted FROM pg_locks WHERE locktype = 'advisory' AND database = :datoid ORDER BY classid, objid, objsubid; - locktype | classid | objid | objsubid | mode | granted ---+-+---+--+---+- - advisory | 0 | 1 | 1 | ExclusiveLock | t - advisory | 0 | 2 | 1 | ShareLock | t - advisory | 1 | 1 | 2 | ExclusiveLock | t - advisory | 2 | 2 | 2 | ShareLock | t -(4 rows) - +ERROR: cache lookup failed for type 17976 -- pg_advisory_unlock_all() shouldn't release xact locks ... With backtrace_functions = 'getBaseTypeAndTypmod' specified, I see the following stack trace of the error: 2024-05-06 08:30:51.344 UTC client backend[869479] pg_regress/tsearch ERROR: cache lookup failed for type 18041 2024-05-06 08:30:51.344 UTC client backend[869479] pg_regress/tsearch BACKTRACE: getBaseTypeAndTypmod at lsyscache.c:2550:4 getBaseType at lsyscache.c:2526:1 find_coercion_pathway at parse_coerce.c:3131:18 can_coerce_type at parse_coerce.c:598:6 func_match_argtypes at parse_func.c:939:6 oper_select_candidate at parse_oper.c:327:5 oper at parse_oper.c:428:6 make_op at parse_oper.c:696:30 transformBoolExpr at parse_expr.c:1431:9 (inlined by) transformExprRecurse at parse_expr.c:226:13 transformExpr at parse_expr.c:133:22 transformWhereClause at parse_clause.c:1867:1 transformSelectStmt at analyze.c:1382:20 (inlined by) transformStmt at analyze.c:368:15 parse_analyze_fixedparams at analyze.c:110:15 pg_analyze_and_rewrite_fixedparams at postgres.c:691:5 (inlined by) exec_simple_query at postgres.c:1190:20 PostgresMain at postgres.c:4680:7 BackendMain at backend_startup.c:61:2 Best regards, Alexanderdiff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out index 3d5de28354..2605554380 100644 --- a/src/test/regress/expected/equivclass.out +++ b/src/test/regress/expected/equivclass.out @@ -1,3 +1,7 @@ +SET client_min_messages TO 'warning'; +DROP TYPE IF EXISTS int8alias1, int8alias2 CASCADE; +DROP TABLE IF EXISTS ec0, ec1, ec2 CASCADE; +RESET client_min_messages; -- -- Tests for the planner's "equivalence class" mechanism -- diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out index cfa391ac51..6839be5f37 100644 --- a/src/test/regress/expected/tsearch.out +++ b/src/test/regress/expected/tsearch.out @@ -1,3 +1,6 @@ +SET client_min_messages TO 'warning'; +DROP TABLE IF EXISTS test_tsvector, test_tsquery CASCADE; +RESET client_min_messages; -- directory paths are passed to us in environment variables \getenv abs_srcdir PG_ABS_SRCDIR -- diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql index 77dd964ebf..fec4a416c8 100644 --- a/src/test/regress/sql/equivclass.sql +++ b/src/test/regress/sql/equivclass.sql @@ -1,3 +1,7 @@ +SET client_min_messages TO 'warning'; +DROP TYPE IF EXISTS int8alias1, int8alias2 CASCADE; +DROP TABLE IF EXISTS ec0, ec1, ec2 CASCADE; +RESET client_min_messages; -- -- Tests for the planner's "equivalence class" mechanism -- diff --git a/src/test/regress/sql/tsearch.sql
Re: cataloguing NOT NULL constraints
02.05.2024 19:21, Alvaro Herrera wrote: Now, you could claim that the standard doesn't mention INCLUDING/EXCLUDING CONSTRAINTS, therefore since we have come up with its definition then we should make it affect not-null constraints. However, there's also this note: NOTE 520 — s, except for NOT NULL, are not included in CDi; s are effectively transformed to s and are thereby also excluded. which is explicitly saying that not-null constraints are treated differently; in essence, with INCLUDING CONSTRAINTS we choose to affect the constraints that the standard says to ignore. Thank you for very detailed and convincing explanation! Now I see what the last sentence here (from [1]) means: INCLUDING CONSTRAINTS CHECK constraints will be copied. No distinction is made between column constraints and table constraints. _Not-null constraints are always copied to the new table._ (I hadn't paid enough attention to it, because this exact paragraph is also presented in previous versions...) [1] https://www.postgresql.org/docs/devel/sql-createtable.html Best regards, Alexander
Re: cataloguing NOT NULL constraints
Hello Alvaro, 01.05.2024 20:49, Alvaro Herrera wrote: Here are two patches that I intend to push soon (hopefully tomorrow). Thank you for fixing those issues! Could you also clarify, please, how CREATE TABLE ... LIKE is expected to work with NOT NULL constraints? I wonder whether EXCLUDING CONSTRAINTS (ALL) should cover not-null constraints too. What I'm seeing now, is that: CREATE TABLE t1 (i int, CONSTRAINT nn NOT NULL i); CREATE TABLE t2 (LIKE t1 EXCLUDING ALL); \d+ t2 -- ends with: Not-null constraints: "nn" NOT NULL "i" Or a similar case with PRIMARY KEY: CREATE TABLE t1 (i int PRIMARY KEY); CREATE TABLE t2 (LIKE t1 EXCLUDING CONSTRAINTS EXCLUDING INDEXES); \d+ t2 -- leaves: Not-null constraints: "t2_i_not_null" NOT NULL "i" Best regards, Alexander
Re: Typos in the code and README
Hello, 28.04.2024 11:05, David Rowley wrote: On Sat, 20 Apr 2024 at 16:09, David Rowley wrote: Here are a few more to see if it motivates anyone else to do a more thorough search for another batch. I've pushed these now. Please look also at the list of other typos and inconsistencies I've found on the master branch: additional_notnulls -> old_notnulls (cf. b0e96f311) ATAddCheckConstraint ->? ATAddCheckNNConstraint (cf. b0e96f311) bt_page_check -> bt_target_page_check (cf. 5ae208720) calblack_arg -> callback_arg combinig -> combining compaining -> complaining ctllock - remove (cf. 53c2a97a9) dabatase -> database eval_const_exprs_mutator -> eval_const_expressions_mutator ExecEndValuesScan - remove (cf. d060e921e) get_json_nested_columns -> get_json_table_nested_columns io_direct -> debug_io_direct iS ->? IS (a neatnik-ism) joing ->? join (cf. 20f90a0e4) _jumbleRangeTblEntry - remove (cf. 367c989cd) Mallroy -> Mallory onstead -> instead procedual -> procedural psql_safe -> safe_psql read_quoted_pattern -> read_quoted_string repertiore -> repertoire rightfirstdataoffset ->? rightfirstoffset (cf. 5ae208720) Sincle ->? Since (perhaps the whole sentence could be rephrased) sslnegotition=direct/requiredirectwould -> sslnegotiation=direct/requiredirect would sslnegotitation -> sslnegotiation walreciever -> walreceiver xid_wrapround -> xid_wraparound (some of them are located in doc/, so it's not a code-only change) I've attached the patch for your convenience, though maybe some of the suggestions are to be discarded. Best regards, Alexanderdiff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c index 20da4a46ba..70f65b645a 100644 --- a/contrib/amcheck/verify_nbtree.c +++ b/contrib/amcheck/verify_nbtree.c @@ -1086,7 +1086,7 @@ bt_entry_unique_check(BtreeCheckState *state, IndexTuple itup, /* * Current tuple has no posting list. If TID is visible save info about it - * for the next comparisons in the loop in bt_page_check(). Report + * for the next comparisons in the loop in bt_target_page_check(). Report * duplicate if lVis_tid is already valid. */ else @@ -1953,7 +1953,7 @@ bt_target_page_check(BtreeCheckState *state) * Note that !readonly callers must reverify that target page has not * been concurrently deleted. * - * Save rightfirstdataoffset for detailed error message. + * Save rightfirstoffset for detailed error message. */ static BTScanInsert bt_right_page_check_scankey(BtreeCheckState *state, OffsetNumber *rightfirstoffset) diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml index d7e78204ad..f4b00399e4 100644 --- a/doc/src/sgml/regress.sgml +++ b/doc/src/sgml/regress.sgml @@ -331,7 +331,7 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption' xid_wraparound - Runs the test suite under src/test/module/xid_wrapround. + Runs the test suite under src/test/module/xid_wraparound. Not enabled by default because it is resource intensive. diff --git a/doc/src/sgml/targets-meson.txt b/doc/src/sgml/targets-meson.txt index 1c42dd3028..bd470c87a7 100644 --- a/doc/src/sgml/targets-meson.txt +++ b/doc/src/sgml/targets-meson.txt @@ -11,7 +11,7 @@ Code Targets: backend Build backend and related modules bin Build frontend binaries contrib Build contrib modules - plBuild procedual languages + plBuild procedural languages Developer Targets: reformat-dat-filesRewrite catalog data files into standard format diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index bf28400dd8..d0f3848c95 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -2598,7 +2598,7 @@ _brin_parallel_heapscan(BrinBuildState *state) * * After waiting for all workers to finish, merge the per-worker results into * the complete index. The results from each worker are sorted by block number - * (start of the page range). While combinig the per-worker results we merge + * (start of the page range). While combining the per-worker results we merge * summaries for the same page range, and also fill-in empty summaries for * ranges without any tuples. * diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c index f99ec38a4a..77b05cc0a7 100644 --- a/src/backend/access/transam/slru.c +++ b/src/backend/access/transam/slru.c @@ -228,7 +228,6 @@ SimpleLruAutotuneBuffers(int divisor, int max) * name: name of SLRU. (This is user-visible, pick with care!) * nslots: number of page slots to use. * nlsns: number of LSN groups per page (set to zero if not relevant). - * ctllock: LWLock to use to control access to the shared control structure. * subdir: PGDATA-relative subdirectory that will contain the files. * buffer_tranche_id: tranche ID to use for the SLRU's
Re: Removing unneeded self joins
30.04.2024 13:20, Alexander Korotkov wrote: On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin wrote: I've discovered another failure, introduced by d3d55ce57. Please try the following: CREATE TABLE t (a int unique, b float); SELECT * FROM t NATURAL JOIN LATERAL (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2; I think we should just forbid SJE in case when relations to be merged have cross-references with lateral vars. The draft patch for this is attached. I'd like to ask Alexander to test it, Richard and Andrei to review it. Thank you! Beside LATERAL vars, it seems that SJR doesn't play well with TABLESAMPLE in general. For instance: CREATE TABLE t (a int unique); INSERT INTO t SELECT * FROM generate_series (1,100); SELECT COUNT(*) FROM (SELECT * FROM t TABLESAMPLE BERNOULLI(1)) t1 NATURAL JOIN (SELECT * FROM t TABLESAMPLE BERNOULLI(100)) t2; returned 100, 100, 100 for me, though with enable_self_join_removal = off, I got 4, 0, 1... Best regards, Alexander
Re: Avoid orphaned objects dependencies, take 3
Hi Bertrand, 25.04.2024 10:20, Bertrand Drouvot wrote: postgres=# CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN f() + 1; ERROR: cache lookup failed for function 16400 This stuff does appear before we get a chance to call the new depLockAndCheckObject() function. I think this is what Tom was referring to in [1]: " So the only real fix for this would be to make every object lookup in the entire system do the sort of dance that's done in RangeVarGetRelidExtended. " The fact that those kind of errors appear also somehow ensure that no orphaned dependencies can be created. I agree; the only thing that I'd change here, is the error code. But I've discovered yet another possibility to get a broken dependency. Please try this script: res=0 numclients=20 for ((i=1;i<=100;i++)); do for ((c=1;c<=numclients;c++)); do echo " CREATE SCHEMA s_$c; CREATE CONVERSION myconv_$c FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8; ALTER CONVERSION myconv_$c SET SCHEMA s_$c; " | psql >psql1-$c.log 2>&1 & echo "DROP SCHEMA s_$c RESTRICT;" | psql >psql2-$c.log 2>&1 & done wait pg_dump -f db.dump || { echo "on iteration $i"; res=1; break; } for ((c=1;c<=numclients;c++)); do echo "DROP SCHEMA s_$c CASCADE;" | psql >psql3-$c.log 2>&1 done done psql -c "SELECT * FROM pg_conversion WHERE connamespace NOT IN (SELECT oid FROM pg_namespace);" It fails for me (with the v4 patch applied) as follows: pg_dump: error: schema with OID 16392 does not exist on iteration 1 oid | conname | connamespace | conowner | conforencoding | contoencoding | conproc | condefault ---+--+--+--++---+---+ 16396 | myconv_6 | 16392 | 10 | 8 | 6 | iso8859_1_to_utf8 | f Best regards, Alexander
Re: Removing unneeded self joins
Hello Alexander, 23.10.2023 12:47, Alexander Korotkov wrote: I think this patch makes substantial improvement to query planning. It has received plenty of reviews. The code is currently in quite good shape. I didn't manage to find the cases when this optimization causes significant overhead to planning time. Even if such cases will be spotted there is a GUC option to disable this feature. So, I'll push this if there are no objections. I've discovered another failure, introduced by d3d55ce57. Please try the following: CREATE TABLE t (a int unique, b float); SELECT * FROM t NATURAL JOIN LATERAL (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2; With asserts enabled, it triggers TRAP: failed Assert("!bms_is_member(rti, lateral_relids)"), File: "initsplan.c", Line: 697, PID: 3074054 ExceptionalCondition at assert.c:52:13 create_lateral_join_info at initsplan.c:700:8 query_planner at planmain.c:257:2 grouping_planner at planner.c:1523:17 subquery_planner at planner.c:1098:2 standard_planner at planner.c:415:9 planner at planner.c:282:12 pg_plan_query at postgres.c:904:9 pg_plan_queries at postgres.c:996:11 exec_simple_query at postgres.c:1193:19 PostgresMain at postgres.c:4684:27 With no asserts, I get: ERROR: failed to construct the join relation Please take a look at this. Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
30.04.2024 03:10, Dmitry Koval wrote: Hi! 1. 29.04.2024 21:00, Alexander Lakhin wrote: I still wonder, why that constraint (now with a less questionable name) is created during MERGE? The SPLIT/MERGE PARTITION(S) commands for creating partitions reuse the existing code of CREATE TABLE .. LIKE ... command. A new partition was created with the name "merge-16385-26BCB0-tmp" (since there was an old partition with the same name). The constraint "merge-16385-26BCB0-tmp_i_not_null" was created too together with the partition. Subsequently, the table was renamed, but the constraint was not. Now a new partition is immediately created with the correct name (the old partition is renamed). Maybe I'm doing something wrong, but the following script: CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i); CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2); CREATE TABLE t2 (LIKE t INCLUDING ALL); CREATE TABLE tp2 (LIKE tp_0 INCLUDING ALL); creates tables t2, tp2 without not-null constraints. But after ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0; I see: \d+ tp_0 ... Indexes: "tp_0_pkey" PRIMARY KEY, btree (i) Not-null constraints: "tp_0_i_not_null" NOT NULL "i" Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hi Dmitry, 19.04.2024 02:26, Dmitry Koval wrote: 18.04.2024 19:00, Alexander Lakhin wrote: leaves a strange constraint: \d+ t* Table "public.tp_0" ... Not-null constraints: "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i" Thanks! Attached fix (with test) for this case. The patch should be applied after patches v6-0001- ... .patch ... v6-0004- ... .patch I still wonder, why that constraint (now with a less questionable name) is created during MERGE? That is, before MERGE, two partitions have only PRIMARY KEY indexes, with no not-null constraint, and you can manually remove the constraint after MERGE, so maybe it's not necessary... Best regards, Alexander
Re: A failure in prepared_xacts test
Hello Tom and Michael, 29.04.2024 08:11, Tom Lane wrote: Michael Paquier writes: If you grep the source tree, you'd notice that a prepared transaction named gxid only exists in the 2PC tests of ECPG, in src/interfaces/ecpg/test/sql/twophase.pgc. So the origin of the failure comes from a race condition due to test parallelization, because the scan of pg_prepared_xacts affects all databases with installcheck, and in your case it means that the scan of pg_prepared_xacts was running in parallel of the ECPG tests with an installcheck. Up to now, we've only worried about whether tests running in parallel within a single test suite can interact. It's quite scary to think that the meson setup has expanded the possibility of interactions to our entire source tree. Maybe that was a bad idea and we should fix the meson infrastructure to not do that. I fear that otherwise, we'll get bit regularly by very-low-probability bugs of this kind. Yes, I'm afraid of the same. For example, the test failure [1] is of that ilk, I guess. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual=2024-04-17%2016%3A33%3A23 Best regards, Alexander
Re: A failure in prepared_xacts test
Hello Richard, 29.04.2024 04:12, Richard Guo wrote: Does anyone have any clue to this failure? FWIW, after another run of this test, the failure just disappears. Does it suggest that the test case is flaky? I think this could be caused by the ecpg test twophase executed simultaneously with the test prepared_xacts thanks to meson's jobs parallelization. Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hello, 28.04.2024 03:59, Alexander Korotkov wrote: The revised patchset is attached. I'm going to push it if there are no objections. I have one additional question regarding security, if you don't mind: What permissions should a user have to perform split/merge? When we deal with mixed ownership, say, bob is an owner of a partitioned table, but not an owner of a partition, should we allow him to perform merge with that partition? Consider the following script: CREATE ROLE alice; GRANT CREATE ON SCHEMA public TO alice; SET SESSION AUTHORIZATION alice; CREATE TABLE t (i int PRIMARY KEY, t text, u text) PARTITION BY RANGE (i); CREATE TABLE tp_00 PARTITION OF t FOR VALUES FROM (0) TO (10); CREATE TABLE tp_10 PARTITION OF t FOR VALUES FROM (10) TO (20); CREATE POLICY p1 ON tp_00 USING (u = current_user); ALTER TABLE tp_00 ENABLE ROW LEVEL SECURITY; INSERT INTO t(i, t, u) VALUES (0, 'info for bob', 'bob'); INSERT INTO t(i, t, u) VALUES (1, 'info for alice', 'alice'); RESET SESSION AUTHORIZATION; CREATE ROLE bob; GRANT CREATE ON SCHEMA public TO bob; ALTER TABLE t OWNER TO bob; GRANT SELECT ON TABLE tp_00 TO bob; SET SESSION AUTHORIZATION bob; SELECT * FROM tp_00; --- here bob can see his info only \d Schema | Name | Type | Owner +---+---+--- public | t | partitioned table | bob public | tp_00 | table | alice public | tp_10 | table | alice -- but then bob can do: ALTER TABLE t MERGE PARTITIONS (tp_00, tp_10) INTO tp_00; -- (yes, he also can detach the partition tp_00, but then he couldn't -- re-attach nor read it) \d Schema | Name | Type | Owner +---+---+--- public | t | partitioned table | bob public | tp_00 | table | bob Thus bob effectively have captured the partition with the data. What do you think, does this create a new security risk? Best regards, Alexander
Re: partitioning and identity column
27.04.2024 18:00, Alexander Lakhin wrote: Please look also at another script, which produces the same error: I've discovered yet another problematic case: CREATE TABLE tbl1 (a int GENERATED ALWAYS AS IDENTITY, b text) PARTITION BY LIST (a); CREATE TABLE tbl2 (b text, a int NOT NULL); ALTER TABLE tbl1 ATTACH PARTITION tbl2 DEFAULT; INSERT INTO tbl2 DEFAULT VALUES; ERROR: no owned sequence found Though it works with tbl2(a int NOT NULL, b text)... Take a look at this too, please. Best regards, Alexander
Re: partitioning and identity column
Hello Ashutosh, 26.04.2024 21:00, Alexander Lakhin wrote: 26.04.2024 15:57, Ashutosh Bapat wrote: Thanks Alexander for the report. On Fri, Apr 26, 2024 at 5:30 PM Alexander Lakhin wrote: CREATE TABLE tbl3 (LIKE tbl2 INCLUDING IDENTITY); ERROR: no owned sequence found Do you want to track this in open items? If you are inclined to fix this behavior, I would add this item. Please look also at another script, which produces the same error: CREATE TABLE tbl1 (a int GENERATED BY DEFAULT AS IDENTITY, b text) PARTITION BY LIST (b); CREATE TABLE tbl2 PARTITION OF tbl1 DEFAULT; ALTER TABLE tbl1 ALTER COLUMN a SET DATA TYPE bigint; ERROR: no owned sequence found (On 699586315~1, it executes successfully and changes the data type of the identity column and it's sequence.) Best regards, Alexander
Re: partitioning and identity column
26.04.2024 15:57, Ashutosh Bapat wrote: Thanks Alexander for the report. On Fri, Apr 26, 2024 at 5:30 PM Alexander Lakhin wrote: CREATE TABLE tbl3 (LIKE tbl2 INCLUDING IDENTITY); ERROR: no owned sequence found I don't think creating a table like a partition is common or even useful. Usually it would create it from partitithe oned table. But if we consider that to be a use case, I think the error is expected since a partition doesn't have its own identity; it shares it with the partitioned table. Maybe we could give a better message. But I will look into this and fix it if the solution makes sense. Maybe it's uncommon, but it's allowed, so users may want to CREATE TABLE sometable (LIKE partX INCLUDING ALL), for example, if the partition has a somewhat different structure. And thinking about how such a restriction could be described in the docs, I would prefer to avoid this error at the implementation level. Do you want to track this in open items? If you are inclined to fix this behavior, I would add this item. Best regards, Alexander
Re: partitioning and identity column
Hello Ashutosh and Peter, 16.01.2024 21:59, Peter Eisentraut wrote: On 09.01.24 15:10, Ashutosh Bapat wrote: Here's complete patch-set. Looks good! Committed. Please take a look at a new error case introduced by 699586315: CREATE TABLE tbl1 (a int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY) PARTITION BY LIST (a); CREATE TABLE tbl2 PARTITION OF tbl1 DEFAULT; CREATE TABLE tbl3 (LIKE tbl2 INCLUDING IDENTITY); ERROR: no owned sequence found Best regards, Alexander
Re: Avoid orphaned objects dependencies, take 3
Hi, 25.04.2024 08:00, Bertrand Drouvot wrote: though concurrent create/drop operations can still produce the "cache lookup failed" error, which is probably okay, except that it is an INTERNAL_ERROR, which assumed to be not easily triggered by users. I did not see any of those "cache lookup failed" during my testing with/without your script. During which test(s) did you see those with v3 applied? You can try, for example, table-trigger, or other tests that check for "cache lookup failed" psql output only (maybe you'll need to increase the iteration count). For instance, I've got (with v4 applied): 2024-04-25 05:48:08.102 UTC [3638763] ERROR: cache lookup failed for function 20893 2024-04-25 05:48:08.102 UTC [3638763] STATEMENT: CREATE TRIGGER modified_c1 BEFORE UPDATE OF c ON t FOR EACH ROW WHEN (OLD.c <> NEW.c) EXECUTE PROCEDURE trigger_func('modified_c'); Or with function-function: 2024-04-25 05:52:31.390 UTC [3711897] ERROR: cache lookup failed for function 32190 at character 54 2024-04-25 05:52:31.390 UTC [3711897] STATEMENT: CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1; -- 2024-04-25 05:52:37.639 UTC [3720011] ERROR: cache lookup failed for function 34465 at character 54 2024-04-25 05:52:37.639 UTC [3720011] STATEMENT: CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1; Attached v4, simply adding more tests to v3. Thank you for working on this! Best regards, Alexander
Re: cataloguing NOT NULL constraints
24.04.2024 20:36, Alvaro Herrera wrote: So I added a restriction that we only accept such a change when recursively adding a constraint, or during binary upgrade. This should limit the damage: you're no longer able to change an existing constraint from NO INHERIT to YES INHERIT merely by doing another ALTER TABLE ADD CONSTRAINT. One thing that has me a little nervous about this whole business is whether we're set up to error out where some child table down the hierarchy has nulls, and we add a not-null constraint to it but fail to do a verification scan. I tried a couple of cases and AFAICS it works correctly, but maybe there are other cases I haven't thought about where it doesn't. Thank you for the fix! While studying the NO INHERIT option, I've noticed that the documentation probably misses it's specification for NOT NULL: https://www.postgresql.org/docs/devel/sql-createtable.html where column_constraint is: ... [ CONSTRAINT constraint_name ] { NOT NULL | NULL | CHECK ( expression ) [ NO INHERIT ] | Also, I've found a weird behaviour with a non-inherited NOT NULL constraint for a partitioned table: CREATE TABLE pt(a int NOT NULL NO INHERIT) PARTITION BY LIST (a); CREATE TABLE dp(a int NOT NULL); ALTER TABLE pt ATTACH PARTITION dp DEFAULT; ALTER TABLE pt DETACH PARTITION dp; fails with: ERROR: relation 16389 has non-inherited constraint "dp_a_not_null" Though with an analogous check constraint, I get: CREATE TABLE pt(a int, CONSTRAINT nna CHECK (a IS NOT NULL) NO INHERIT) PARTITION BY LIST (a); ERROR: cannot add NO INHERIT constraint to partitioned table "pt" Best regards, Alexander
Re: Avoid orphaned objects dependencies, take 3
Hi Bertrand, 24.04.2024 11:38, Bertrand Drouvot wrote: Please find attached v2 that should not produce the issue anymore (I launched a lot of attempts without any issues). v1 was not strong enough as it was not always checking for the dependent object existence. v2 now always checks if the object still exists after the additional lock acquisition attempt while recording the dependency. I still need to think about v2 but in the meantime could you please also give v2 a try on you side? I gave more thought to v2 and the approach seems reasonable to me. Basically what it does is that in case the object is already dropped before we take the new lock (while recording the dependency) then the error message is a "generic" one (means it does not provide the description of the "already" dropped object). I think it makes sense to write the patch that way by abandoning the patch's ambition to tell the description of the dropped object in all the cases. Of course, I would be happy to hear others thought about it. Please find v3 attached (which is v2 with more comments). Thank you for the improved version! I can confirm that it fixes that case. I've also tested other cases that fail on master (most of them also fail with v1), please try/look at the attached script. (There could be other broken-dependency cases, of course, but I think I've covered all the representative ones.) All tested cases work correctly with v3 applied — I couldn't get broken dependencies, though concurrent create/drop operations can still produce the "cache lookup failed" error, which is probably okay, except that it is an INTERNAL_ERROR, which assumed to be not easily triggered by users. Best regards, Alexanderif [ "$1" == "function-schema" ]; then res=1 for ((i=1;i<=300;i++)); do ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) >psql1.log 2>&1 & echo " CREATE SCHEMA s; CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;'; CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;'; CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;'; CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;'; CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;'; " | psql >psql2.log 2>&1 & wait psql -c "DROP SCHEMA s CASCADE" >psql3.log 2>&1 done echo " SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | psql sleep 1 grep 'was terminated' server.log || res=0 fi if [ "$1" == "function-function" ]; then res=1 for ((i=1;i<=1000;i++)); do ( { for ((n=1;n<=20;n++)); do echo "DROP FUNCTION f();"; done } | psql ) >psql1.log 2>&1 & ( echo " CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN 1; CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1; CREATE FUNCTION f2() RETURNS int LANGUAGE SQL RETURN f() + 2; CREATE FUNCTION f3() RETURNS int LANGUAGE SQL RETURN f() + 3; CREATE FUNCTION f4() RETURNS int LANGUAGE SQL RETURN f() + 4; CREATE FUNCTION f5() RETURNS int LANGUAGE SQL RETURN f() + 5; " | psql ) >psql2.log 2>&1 & wait echo "ALTER FUNCTION f1() RENAME TO f01; SELECT f01()" | psql 2>&1 | grep 'cache lookup failed' && { echo "on iteration $i"; res=1; break; } psql -c "DROP FUNCTION f() CASCADE" >psql3.log 2>&1 done grep -A2 'cache lookup failed' server.log || res=0 fi if [ "$1" == "function-rettype" ]; then res=0 for ((i=1;i<=5000;i++)); do ( echo " CREATE DOMAIN id AS int; CREATE FUNCTION f1() RETURNS id LANGUAGE SQL RETURN 1; CREATE FUNCTION f2() RETURNS id LANGUAGE SQL RETURN 2; CREATE FUNCTION f3() RETURNS id LANGUAGE SQL RETURN 3; CREATE FUNCTION f4() RETURNS id LANGUAGE SQL RETURN 4; CREATE FUNCTION f5() RETURNS id LANGUAGE SQL RETURN 5; " | psql ) >psql1.log 2>&1 & ( echo "SELECT pg_sleep(random() / 100); DROP DOMAIN id" | psql ) >psql2.log 2>&1 & wait psql -c "SELECT f1()" 2>&1 | grep 'cache lookup failed' && { echo "on iteration $i"; res=1; break; } psql -c "DROP DOMAIN id CASCADE" >psql3.log 2>&1 done [ $res == 1 ] && psql -c "SELECT pn.oid, proname, pronamespace, proowner, prolang, prorettype FROM pg_proc pp INNER JOIN pg_namespace pn ON (pp.pronamespace = pn.oid) WHERE pn.nspname='public'" fi if [ "$1" == "function-argtype" ]; then res=0 for ((i=1;i<=5000;i++)); do ( echo " CREATE DOMAIN id AS int; CREATE FUNCTION f1(i id) RETURNS int LANGUAGE SQL RETURN 1; CREATE FUNCTION f2(i id) RETURNS int LANGUAGE SQL RETURN 2; CREATE FUNCTION f3(i id) RETURNS int LANGUAGE SQL RETURN 3; CREATE FUNCTION f4(i id) RETURNS int LANGUAGE SQL RETURN 4; CREATE FUNCTION f5(i id) RETURNS int LANGUAGE SQL RETURN 5; " | psql ) >psql1.log 2>&1 & ( echo "SELECT pg_sleep(random() / 1000); DROP DOMAIN id" | psql ) >psql2.log 2>&1 & wait psql -c "SELECT f1(1)" 2>&1 | grep 'cache lookup failed' && { echo "on iteration $i"; res=1; break; } psql -q -c "DROP DOMAIN id CASCADE" >/dev/null 2>&1 done grep -A1 'cache lookup failed' server.log || res=0 fi
Re: Avoid orphaned objects dependencies, take 3
22.04.2024 13:52, Bertrand Drouvot wrote: That's weird, I just launched it several times with the patch applied and I'm not able to see the seg fault (while I can see it constently failing on the master branch). Are you 100% sure you tested it against a binary with the patch applied? Yes, at least I can't see what I'm doing wrong. Please try my self-contained script attached. Best regards, Alexander#!/bin/bash git reset --hard HEAD; git clean -dfx >/dev/null wget https://www.postgresql.org/message-id/attachment/159685/v1-0001-Avoid-orphaned-objects-dependencies.patch git apply v1-0001-Avoid-orphaned-objects-dependencies.patch || exit 1 CPPFLAGS="-Og" ./configure --enable-debug --enable-cassert -q && make -j8 -s && TESTS="test_setup" make -s check-tests PGROOT=`pwd`/tmp_install export PGDATA=`pwd`/tmpdb export PATH="$PGROOT/usr/local/pgsql/bin:$PATH" export LD_LIBRARY_PATH="$PGROOT/usr/local/pgsql/lib" initdb >initdb.log 2>&1 export PGPORT=55432 echo " port=$PGPORT fsync = off " > $PGDATA/postgresql.auto.conf pg_ctl -l server.log start export PGDATABASE=regression createdb regression for ((i=1;i<=300;i++)); do ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) >psql1.log 2>&1 & echo " CREATE SCHEMA s; CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;'; CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;'; CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;'; CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;'; CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;'; " | psql >psql2.log 2>&1 & wait psql -c "DROP SCHEMA s CASCADE" >psql3.log done echo " SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | psql pg_ctl -w -t 5 -D "$PGDATA" stop
Re: Avoid orphaned objects dependencies, take 3
Hi Bertrand, 22.04.2024 11:45, Bertrand Drouvot wrote: Hi, This new thread is a follow-up of [1] and [2]. Problem description: We have occasionally observed objects having an orphaned dependency, the most common case we have seen is functions not linked to any namespaces. ... Looking forward to your feedback, This have reminded me of bug #17182 [1]. Unfortunately, with the patch applied, the following script: for ((i=1;i<=100;i++)); do ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) >psql1.log 2>&1 & echo " CREATE SCHEMA s; CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;'; CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;'; CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;'; CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;'; CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;'; " | psql >psql2.log 2>&1 & wait psql -c "DROP SCHEMA s CASCADE" >psql3.log done echo " SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | psql still ends with: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|LOG: server process (PID 1388378) was terminated by signal 11: Segmentation fault 2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|DETAIL: Failed process was running: SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL [1] https://www.postgresql.org/message-id/flat/17182-a6baa001dd1784be%40postgresql.org Best regards, Alexander
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Hello Peter, 07.04.2024 20:18, Peter Geoghegan wrote: On Sun, Apr 7, 2024 at 1:00 PM Alexander Lakhin wrote: SELECT * FROM t WHERE a < ANY (ARRAY[1]) AND b < ANY (ARRAY[1]); TRAP: failed Assert("so->numArrayKeys"), File: "nbtutils.c", Line: 560, PID: 3251267 I immediately see what's up here. WIll fix this in the next short while. There is no bug here in builds without assertions, but it's probably best to keep the assertion, and to just make sure not to call _bt_preprocess_array_keys_final() unless it has real work to do. Please look at another case, where a similar Assert (but this time in _bt_preprocess_keys()) is triggered: CREATE TABLE t (a text, b text); INSERT INTO t (a, b) SELECT 'a', repeat('b', 100) FROM generate_series(1, 500) g; CREATE INDEX t_idx ON t USING btree(a); BEGIN; DECLARE c CURSOR FOR SELECT a FROM t WHERE a = 'a'; FETCH FROM c; FETCH RELATIVE 0 FROM c; TRAP: failed Assert("so->numArrayKeys"), File: "nbtutils.c", Line: 2582, PID: 1130962 Best regards, Alexander
Re: fix tablespace handling in pg_combinebackup
22.04.2024 00:49, Thomas Munro wrote: On Mon, Apr 22, 2024 at 12:00 AM Alexander Lakhin wrote: From what I can see, the following condition (namely, -l): if ($path =~ /^pg_tblspc\/(\d+)$/ && -l "$backup_path/$path") { push @tsoids, $1; return 0; } is false for junction points on Windows (cf [1]), but the target path is: Ah, yes, right, -l doesn't like junction points. Well, we're already using the Win32API::File package (it looks like a CPAN library, but I guess the Windows perl distros like Strawberry are all including it for free?). See PostgreSQL::Test::Utils::is_symlink(), attached. That seems to work as expected, but someone who actually knows perl can surely make it better. Then I hit the next problem: readlink C:\cirrus\build/testrun/pg_combinebackup/002_compare_backups\data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc/16415: Inappropriate I/O control operation at C:/cirrus/src/test/perl/PostgreSQL/Test/Cluster.pm line 927. https://cirrus-ci.com/task/5162332353986560 I don't know where exactly that error message is coming from, but assuming that Strawberry Perl contains this code: https://github.com/Perl/perl5/blob/f936cd91ee430786a1bb6068a4a7c8362610dd5f/win32/win32.c#L2041 https://github.com/Perl/perl5/blob/f936cd91ee430786a1bb6068a4a7c8362610dd5f/win32/win32.c#L1976 ... then it's *very* similar to what we're doing in our own pgreadlink() code. I wondered if the buffer size might be too small for our path, but it doesn't seem so: https://github.com/Perl/perl5/blob/f936cd91ee430786a1bb6068a4a7c8362610dd5f/win32/win32.c#L1581C1-L1581C35 (I think MAX_PATH is 256 on Windows.) If there is some unfixable problem with what they're doing in their readlink(), then I guess it should be possible to read the junction point directly in Perl using Win32API::File::DeviceIoControl()... but I can't see what's wrong with it! Maybe it's not the same code? I wonder whether the target path (\??\) of that junction point is fully correct. I tried: > mklink /j "C:/src/postgresql/build/testrun/pg_combinebackup/002_compare_backups/data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc/test" \\?\C:\t1 Junction created for C:/src/postgresql/build/testrun/pg_combinebackup/002_compare_backups/data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc/test <<===>> \\?\C:\t1 and my $path = 'C:/src/postgresql/build/testrun/pg_combinebackup/002_compare_backups/data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc/test'; my $result = readlink($path); works for me: result: \\?\C:\t1 Whilst with: my $path = 'C:/src/postgresql/build/testrun/pg_combinebackup/002_compare_backups/data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc/16415'; readlink() fails with "Invalid argument". > dir "C:/src/postgresql/build/testrun/pg_combinebackup/002_compare_backups/data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc" 04/22/2024 08:16 AM . 04/22/2024 08:16 AM .. 04/22/2024 06:52 AM 16415 [\??\C:\Users\ADMINI~1\AppData\Local\Temp\1zznr8FW5N\ts1backup] 04/22/2024 08:16 AM test [\\?\C:\t1] > dir "\??\C:\Users\ADMINI~1\AppData\Local\Temp\1zznr8FW5N\ts1backup" Directory of C:\??\C:\Users\ADMINI~1\AppData\Local\Temp\1zznr8FW5N\ts1backup File Not Found > dir "\\?\C:\t1" Directory of \\?\C:\t1 04/22/2024 08:06 AM . 04/22/2024 08:06 AM .. 0 File(s) 0 bytes Though > dir "C:/src/postgresql/build/testrun/pg_combinebackup/002_compare_backups/data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc/16415" somehow really works: Directory of C:\src\postgresql\build\testrun\pg_combinebackup\002_compare_backups\data\t_002_compare_backups_primary_data\backup\backup1\pg_tblspc\16415 04/22/2024 06:52 AM . 04/22/2024 06:52 AM .. 04/22/2024 06:52 AM PG_17_202404021 Best regards, Alexander
Re: fix tablespace handling in pg_combinebackup
Hello Thomas and Robert, 20.04.2024 08:56, Thomas Munro wrote: ... it still broke[4]. So I'm not sure what's going on... From what I can see, the following condition (namely, -l): if ($path =~ /^pg_tblspc\/(\d+)$/ && -l "$backup_path/$path") { push @tsoids, $1; return 0; } is false for junction points on Windows (cf [1]), but the target path is: Directory of C:\src\postgresql\build\testrun\pg_combinebackup\002_compare_backups\data\t_002_compare_backups_primary_data\backup\backup1\pg_tblspc 04/21/2024 02:05 PM . 04/21/2024 02:05 PM .. 04/21/2024 02:05 PM 16415 [\??\C:\Users\ADMINI~1\AppData\Local\Temp\xXMfNDMCot\ts1backup] [1] https://www.perlmonks.org/?node_id=1223819 Best regards, Alexander
Performance of JSON_TABLE vs jsonb_to_recordset
Hello hackers, When playing with JSON_TABLE, I tried to replace tenk1 in regression tests with a view based on JSON_TABLE, with the same content, and discovered that for one sub-optimal query it's execution duration increased many-fold. With the preparation script attached, I see the following durations (for a build compiled by clang 18.1.3 with -O3): explain (verbose, analyze) select (select max((select i.unique2 from tenk1 i where i.unique1 = o.unique1))) from tenk1 o; -- original tenk1 Execution Time: 4769.481 ms explain (verbose, analyze) select (select max((select i.unique2 from jsonb_rs_tenk1 i where i.unique1 = o.unique1))) from jsonb_rs_tenk1 o; -- Function Call: jsonb_to_recordset... Execution Time: 6841.767 ms explain (verbose, analyze) select (select max((select i.unique2 from jsontable_tenk1 i where i.unique1 = o.unique1))) from jsontable_tenk1 o; -- Table Function Call: JSON_TABLE... Execution Time: 288310.131 ms (with 63% of time spent inside ExecEvalJsonExprPath()) Just for fun I've tested also XMLTABLE with the similar content: explain (verbose, analyze) select (select max((select i.unique2 from xml_tenk1 i where i.unique1 = o.unique1))) from xml_tenk1 o; -- Table Function Call: XMLTABLE... Execution Time: 1235066.636 ms Maybe it's worth to add a note to the JSON_TABLE() documentation saying that jsonb_to_recordset is (inherently?) more performant when processing arrays of flat structures for users not to re-discover this fact... Best regards, Alexander jsontable-perf-setup.sql Description: application/sql
Re: subscription/026_stats test is intermittently slow?
Hello Michael and Robert, 20.04.2024 05:57, Michael Paquier wrote: On Fri, Apr 19, 2024 at 01:57:41PM -0400, Robert Haas wrote: It looks to me like in the first run it took 3 minutes for the replay_lsn to catch up to the desired value, and in the second run, two seconds. I think I have seen previous instances where something similar happened, although in those cases I did not stop to record any details. Have others seen this? Is there something we can/should do about it? FWIW, I've also seen delays as well with this test on a few occasions. Thanks for mentioning it. It reminds me of https://www.postgresql.org/message-id/858a7622-2c81-1687-d1df-1322dfcb2e72%40gmail.com At least, I could reproduce such a delay with the attached patch applied. Best regards, Alexanderdiff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c index 66070e9131..9368a3a897 100644 --- a/src/backend/replication/logical/launcher.c +++ b/src/backend/replication/logical/launcher.c @@ -227,6 +227,7 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker, if (rc & WL_LATCH_SET) { +pg_usleep(30); ResetLatch(MyLatch); CHECK_FOR_INTERRUPTS(); } diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl index d1d68fad9a..a82f1c051c 100644 --- a/src/test/subscription/t/026_stats.pl +++ b/src/test/subscription/t/026_stats.pl @@ -37,6 +37,7 @@ sub create_sub_pub_w_errors $db, qq[ BEGIN; +SELECT pg_sleep(0.5); CREATE TABLE $table_name(a int primary key); INSERT INTO $table_name VALUES (1); COMMIT;
Re: tablecmds.c/MergeAttributes() cleanup
Hello, 30.01.2024 09:22, Ashutosh Bapat wrote: Please look at the segmentation fault triggered by the following query since 4d969b2f8: CREATE TABLE t1(a text COMPRESSION pglz); CREATE TABLE t2(a text); CREATE TABLE t3() INHERITS(t1, t2); NOTICE: merging multiple inherited definitions of column "a" server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Core was generated by `postgres: law regression [local] CREATE TABLE '. Program terminated with signal SIGSEGV, Segmentation fault. (gdb) bt #0 __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116 #1 0x5606fbcc9d52 in MergeAttributes (columns=0x0, supers=supers@entry=0x5606fe293d30, relpersistence=112 'p', is_partition=false, supconstr=supconstr@entry=0x7fff4046d410, supnotnulls=supnotnulls@entry=0x7fff4046d418) at tablecmds.c:2811 #2 0x5606fbccd764 in DefineRelation (stmt=stmt@entry=0x5606fe26a130, relkind=relkind@entry=114 'r', ownerId=10, ownerId@entry=0, typaddress=typaddress@entry=0x0, queryString=queryString@entry=0x5606fe2695c0 "CREATE TABLE t3() INHERITS(t1, t2);") at tablecmds.c:885 This bug existed even before the refactoring.Happens because strcmp() is called on NULL input (t2's compression is NULL). I already have a fix for this and will be posting it in [1]. [1] https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7%40eisentraut.org Now that that fix is closed with RwF [1], shouldn't this crash issue be added to Open Items for v17? (I couldn't reproduce the crash on 4d969b2f8~1 nor on REL_16_STABLE.) https://commitfest.postgresql.org/47/4813/ Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
18.04.2024 20:49, Alvaro Herrera wrote: On 2024-Apr-18, Alexander Lakhin wrote: I think the feature implementation should also provide tab completion for SPLIT/MERGE. I don't think that we should be imposing on feature authors or committers the task of filling in tab-completion for whatever features they contribute. I mean, if they want to add that, cool; but if not, somebody else can do that, too. It's not a critical piece. I agree, I just wanted to note the lack of the current implementation. But now, thanks to Dagfinn, we have the tab completion too. I have also a question regarding "ALTER TABLE ... SET ACCESS METHOD". The current documentation says: When applied to a partitioned table, there is no data to rewrite, but partitions created afterwards will default to the given access method unless overridden by a USING clause. But MERGE/SPLIT behave differently (if one can assume that MERGE/SPLIT create new partitions under the hood): CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler; CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i); ALTER TABLE t SET ACCESS METHOD heap2; CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2); \d t+ Partitioned table "public.t" ... Access method: heap2 Table "public.tp_0" ... Access method: heap2 Table "public.tp_1" ... Access method: heap2 ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0; Partitioned table "public.t" ... Access method: heap2 Table "public.tp_0" ... Access method: heap Shouldn't it be changed, what do you think? Best regards, Alexander
Re: cataloguing NOT NULL constraints
Hello Alvaro, 18.04.2024 16:39, Alvaro Herrera wrote: I have pushed a fix which should hopefully fix this problem (d9f686a72e). Please give this a look. Thanks for reporting the issue. Please look at an assertion failure, introduced with d9f686a72: CREATE TABLE t(a int, NOT NULL a NO INHERIT); CREATE TABLE t2() INHERITS (t); ALTER TABLE t ADD CONSTRAINT nna NOT NULL a; TRAP: failed Assert("lockmode != NoLock || IsBootstrapProcessingMode() || CheckRelationLockedByMe(r, AccessShareLock, true)"), File: "relation.c", Line: 67, PID: 2980258 On d9f686a72~1 this script results in: ERROR: cannot change NO INHERIT status of inherited NOT NULL constraint "t_a_not_null" on relation "t" Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hi Alexander, 18.04.2024 13:35, Alexander Korotkov wrote: The revised patchset is attached. 1) I've split the fix for the CommandCounterIncrement() issue and the fix for relation persistence issue into a separate patch. 2) I've validated that the lock on the new partition is held in createPartitionTable() after ProcessUtility() as pointed out by Robert. So, no need to place the lock again. 3) Added fix for problematic error message as a separate patch [1]. 4) Added rename "salemans" => "salesmen" for tests as a separate patch. I think these fixes are reaching committable shape, but I'd like someone to check it before I push. I think the feature implementation should also provide tab completion for SPLIT/MERGE. (ALTER TABLE t S fills in only SET now.) Also, the following MERGE operation: CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i); CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2); ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0; leaves a strange constraint: \d+ t* Table "public.tp_0" ... Not-null constraints: "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i" Best regards, Alexander
clang's sanitizer makes stringToNode() extremely slow
Hello hackers, When using a server built with clang (15, 18) with sanitizers enabled, the last query in the following script: SET parallel_setup_cost = 0; SET min_parallel_table_scan_size = 0; SELECT a::text INTO t FROM generate_series(1, 1000) a; \timing on SELECT string_agg(a, ',') FROM t WHERE a = REPEAT('0', 40); runs for more than a minute on my workstation (a larger repeat count gives much longer duration): Time: 66017.594 ms (01:06.018) The typical stack trace for a running parallel worker is: #0 0x559a23671885 in __sanitizer::internal_strlen(char const*) () #1 0x559a236568ed in StrtolFixAndCheck(void*, char const*, char**, char*, int) () #2 0x559a236423dc in __interceptor_strtol () #3 0x559a240027d9 in atoi (...) at readfuncs.c:629 #5 0x559a23fb03f0 in _readConst () at readfuncs.c:275 #6 parseNodeString () at ./readfuncs.switch.c:29 #7 0x559a23faa421 in nodeRead ( token=0x7fee75cf3bd2 "{CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location -1 :constvalue 44 [ 16 106 24 0 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48"..., tok_len=1) at read.c:338 #8 0x559a23faa916 in nodeRead (...) at read.c:452 #9 0x559a23fb3f34 in _readOpExpr () at ./readfuncs.funcs.c:279 #10 0x559a23fb0842 in parseNodeString () at ./readfuncs.switch.c:47 #11 0x559a23faa421 in nodeRead (...) at read.c:338 #12 0x559a23faa916 in nodeRead (...) at read.c:452 #13 0x559a23fefb74 in _readSeqScan () at ./readfuncs.funcs.c:3954 #14 0x559a23fb2b97 in parseNodeString () at ./readfuncs.switch.c:559 #15 0x559a23faa421 in nodeRead (...) at read.c:338 #16 0x559a23ffc033 in _readAgg () at ./readfuncs.funcs.c:4685 #17 0x559a23fb2dd3 in parseNodeString () at ./readfuncs.switch.c:611 #18 0x559a23faa421 in nodeRead (...) at read.c:338 #19 0x559a23feb340 in _readPlannedStmt () at ./readfuncs.funcs.c:3685 #20 0x559a23fb2ad1 in parseNodeString () at ./readfuncs.switch.c:541 #21 0x559a23faa421 in nodeRead (...) at read.c:338 #22 0x559a23fa99d8 in stringToNodeInternal (...) at read.c:92 #24 0x559a23d66609 in ExecParallelGetQueryDesc (...) at execParallel.c:1250 #25 ParallelQueryMain (...) at execParallel.c:1424 #26 0x559a238cfe13 in ParallelWorkerMain (...) at parallel.c:1516 #27 0x559a241e5b6a in BackgroundWorkerMain (...) at bgworker.c:848 #28 0x559a241ec254 in postmaster_child_launch (...) at launch_backend.c:265 #29 0x559a241f1c15 in do_start_bgworker (...) at postmaster.c:4270 #30 maybe_start_bgworkers () at postmaster.c:4501 #31 0x559a241f486e in process_pm_pmsignal () at postmaster.c:3774 #32 ServerLoop () at postmaster.c:1667 #33 0x559a241f0ed6 in PostmasterMain (...) at postmaster.c:1372 #34 0x559a23ebe16d in main (...) at main.c:197 The flamegraph (which shows that readDatum() -> __interceptor_strtol() -> StrtolFixAndCheck() -> __sanitizer::internal_strlen () takes >99% of time) is attached. (I could not reproduce this behaviour with the gcc's sanitizer.) Moreover, this query cannot be canceled, you can only kill -SIGKILL parallel workers to interrupt it. Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hello Robert, 15.04.2024 17:30, Robert Haas wrote: On Sat, Apr 13, 2024 at 6:05 AM Alexander Korotkov wrote: Please, find a my version of this fix attached. I think we need to check relpersistence in a similar way ATTACH PARTITION or CREATE TABLE ... PARTITION OF do. I'm going to polish this a little bit more. + errmsg("\"%s\" is not an ordinary table", This is not a phrasing that we use in any other error message. We always just say "is not a table". Initially I was confused by that message, because of: CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE FOREIGN TABLE ftp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1) SERVER loopback OPTIONS (table_name 'lt_0_1'); CREATE FOREIGN TABLE ftp_1_2 PARTITION OF t FOR VALUES FROM (1) TO (2) SERVER loopback OPTIONS (table_name 'lt_1_2'); ALTER TABLE t MERGE PARTITIONS (ftp_0_1, ftp_1_2) INTO ftp_0_2; ERROR: "ftp_0_1" is not a table (Isn't a foreign table a table?) And also: CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE TABLE tp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE t2 (i int) PARTITION BY RANGE (i); ALTER TABLE t MERGE PARTITIONS (tp_0_1, t2) INTO tpn; ERROR: "t2" is not a table (Isn't a partitioned table a table?) And in fact, an ordinary table is not suitable for MERGE anyway: CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE TABLE tp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE t2 (i int); ALTER TABLE t MERGE PARTITIONS (tp_0_1, t2) INTO tpn; ERROR: "t2" is not a partition So I don't think that "an ordinary table" is a good (unambiguous) term either. Best regards, Alexander
Re: Parallel CREATE INDEX for BRIN indexes
Hello Tomas, 14.04.2024 20:09, Tomas Vondra wrote: I've pushed this, including backpatching the two fixes. I've reduced the amount of data needed by the test, and made sure it works on 32-bits too (I was a bit worried it might be sensitive to that, but that seems not to be the case). I've discovered that that test addition brings some instability to the test. With the following pageinspect/Makefile modification: -REGRESS = page btree brin gin gist hash checksum oldextversions +REGRESS = page btree brin $(shell printf 'brin %.0s' `seq 99`) gin gist hash checksum oldextversions echo "autovacuum_naptime = 1" > /tmp/temp.config TEMP_CONFIG=/tmp/temp.config make -s check -C contrib/pageinspect fails for me as below: ... ok 17 - brin 127 ms not ok 18 - brin 140 ms ok 19 - brin 125 ms ... # 4 of 107 tests failed. The following change: -CREATE TABLE brin_parallel_test (a int, b text, c bigint) WITH (fillfactor=40); +CREATE TEMP TABLE brin_parallel_test (a int, b text, c bigint) WITH (fillfactor=40); (similar to e2933a6e1) makes the test pass reliably for me. Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hi Dmitry, 12.04.2024 16:04, Dmitry Koval wrote: Hi! Attached is a patch with corrections based on comments in previous letters (I think these corrections are not final). I'll be very grateful for feedbacks and bug reports. 11.04.2024 20:00, Alexander Lakhin wrote: > may be an attempt to merge into implicit > pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does? Corrected. Result is: Thank you! Still now we're able to create a partition in the pg_temp schema explicitly. Please try: ALTER TABLE t MERGE PARTITIONS (tp_0_1, tp_1_2) INTO pg_temp.tp_0_2; in the scenario [1] and you'll get the same empty table. [1] https://www.postgresql.org/message-id/fdaa003e-919c-cbc9-4f0c-e4546e96bd65%40gmail.com Best regards, Alexander
Re: Issue with the PRNG used by Postgres
12.04.2024 08:05, Alexander Lakhin wrote: 2024-04-12 05:00:17.981 UTC [762336] PANIC: stuck spinlock detected at WaitBufHdrUnlocked, bufmgr.c:5726 It looks like that spinlock issue caused by a race condition/deadlock. What I see when the test fails is: A client backend executing "DROP DATABASE conflict_db" performs dropdb() -> DropDatabaseBuffers() -> InvalidateBuffer() At the same time, bgwriter performs (for the same buffer): BgBufferSync() -> SyncOneBuffer() When InvalidateBuffer() is called, the buffer refcount is zero, then bgwriter pins the buffer, thus increases refcount; InvalidateBuffer() gets into the retry loop; bgwriter calls UnpinBuffer() -> UnpinBufferNoOwner() -> WaitBufHdrUnlocked(), which waits for !BM_LOCKED state, while InvalidateBuffer() waits for the buffer refcount decrease. As it turns out, it's not related to spinlocks' specifics or PRNG, just a serendipitous find. Best regards, Alexander
Re: Issue with the PRNG used by Postgres
Hi Andres, 12.04.2024 07:41, Andres Freund wrote: FWIW, I just reproduced the scenario with signals. I added tracking of the total time actually slept and lost to SpinDelayStatus, and added a function to trigger a wait on a spinlock. To wait less, I set max_standby_streaming_delay=0.1, but that's just for easier testing in isolation. In reality that could have been reached before the spinlock is even acquired. On a standby, while a recovery conflict is happening: PANIC: XX000: stuck spinlock detected at crashme, path/to/file:line, after 4.38s, lost 127.96s So right now it's really not hard to trigger the stuck-spinlock logic completely spuriously. This doesn't just happen with hot standby, there are plenty other sources of lots of signals being sent. I managed to trigger that logic when trying to construct a reproducer for bug #18426. With the following delays added: --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1776,6 +1776,7 @@ retry: */ if (BUF_STATE_GET_REFCOUNT(buf_state) != 0) { +pg_usleep(30L); UnlockBufHdr(buf, buf_state); LWLockRelease(oldPartitionLock); /* safety check: should definitely not be our *own* pin */ @@ -5549,6 +5550,7 @@ TerminateBufferIO(BufferDesc *buf, bool clear_dirty, uint32 set_flag_bits, Assert(buf_state & BM_IO_IN_PROGRESS); +pg_usleep(300); buf_state &= ~(BM_IO_IN_PROGRESS | BM_IO_ERROR); if (clear_dirty && !(buf_state & BM_JUST_DIRTIED)) buf_state &= ~(BM_DIRTY | BM_CHECKPOINT_NEEDED); and /tmp/temp.config: bgwriter_delay = 10 TEMP_CONFIG=/tmp/temp.config make -s check -C src/test/recovery PROVE_TESTS="t/032*" fails for me on iterations 22, 23, 37: 2024-04-12 05:00:17.981 UTC [762336] PANIC: stuck spinlock detected at WaitBufHdrUnlocked, bufmgr.c:5726 I haven't investigated this case yet. Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
11.04.2024 16:27, Dmitry Koval wrote: Added correction (and test), see v3-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch. Thank you for the correction, but may be an attempt to merge into implicit pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does? Please look also at another anomaly with schemas: CREATE SCHEMA s1; CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2); ALTER TABLE t SPLIT PARTITION tp_0_2 INTO (PARTITION s1.tp0 FOR VALUES FROM (0) TO (1), PARTITION s1.tp1 FOR VALUES FROM (1) TO (2)); results in: \d+ s1.* Did not find any relation named "s1.*" \d+ tp* Table "public.tp0" ... Table "public.tp1" ... Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hi Dmitry, 11.04.2024 11:59, Dmitry Koval wrote: FWIW, I also proposed a patch earlier that fixes error messages and comments in the split partition code Sorry, I thought all the fixes you suggested were already included in v1-0002-Fixes-for-english-text.patch (but they are not). Added missing lines to v2-0002-Fixes-for-english-text.patch. It seems to me that v2-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch is not complete either. Take a look, please: CREATE TABLE t (i int) PARTITION BY RANGE (i); SET search_path = pg_temp, public; CREATE TABLE tp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1); -- fails with: ERROR: cannot create a temporary relation as partition of permanent relation "t" But: CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE TABLE tp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE tp_1_2 PARTITION OF t FOR VALUES FROM (1) TO (2); INSERT INTO t VALUES(0), (1); SELECT * FROM t; -- the expected result is: i --- 0 1 (2 rows) SET search_path = pg_temp, public; ALTER TABLE t MERGE PARTITIONS (tp_0_1, tp_1_2) INTO tp_0_2; -- succeeds, and \c - SELECT * FROM t; -- gives: i --- (0 rows) Please also ask your tech writers to check contents of src/test/sql/*, if possible (perhaps, they'll fix "salesmans" and improve grammar). Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
10.04.2024 12:00, Alexander Lakhin wrote: Hello Alexander and Dmitry, 10.04.2024 02:03, Alexander Korotkov wrote: Thank you. I've pushed this fix with minor corrections from me. Please look at another anomaly with MERGE. CREATE TEMP TABLE t (i int) PARTITION BY RANGE (i); CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2); fails with ERROR: cannot create a permanent relation as partition of temporary relation "t" But CREATE TEMP TABLE t (i int) PARTITION BY RANGE (i); CREATE TEMP TABLE tp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TEMP TABLE tp_1_2 PARTITION OF t FOR VALUES FROM (1) TO (2); ALTER TABLE t MERGE PARTITIONS (tp_0_1, tp_1_2) INTO tp_0_2; succeeds and we get: regression=# \d+ t* Partitioned table "pg_temp_1.t" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description +-+---+--+-+-+-+--+- i | integer | | | | plain | | | Partition key: RANGE (i) Partitions: tp_0_2 FOR VALUES FROM (0) TO (2) Table "public.tp_0_2" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description +-+---+--+-+-+-+--+- i | integer | | | | plain | | | Partition of: t FOR VALUES FROM (0) TO (2) Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 2)) Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hello Alexander and Dmitry, 10.04.2024 02:03, Alexander Korotkov wrote: On Mon, Apr 8, 2024 at 11:43 PM Dmitry Koval wrote: Attached fix for the problems found by Alexander Lakhin. About grammar errors. Unfortunately, I don't know English well. Therefore, I plan (in the coming days) to show the text to specialists who perform technical translation of documentation. Thank you. I've pushed this fix with minor corrections from me. Thank you for fixing that defect! Please look at an error message emitted for foreign tables: CREATE TABLE t (i int) PARTITION BY RANGE (i); CREATE FOREIGN TABLE ftp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1) SERVER loopback OPTIONS (table_name 'lt_0_1'); CREATE FOREIGN TABLE ftp_1_2 PARTITION OF t FOR VALUES FROM (1) TO (2) SERVER loopback OPTIONS (table_name 'lt_1_2'); ALTER TABLE t MERGE PARTITIONS (ftp_0_1, ftp_1_2) INTO ftp_0_2; ERROR: "ftp_0_1" is not a table Shouldn't it be more correct/precise? Best regards, Alexander
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
08.04.2024 18:08, Tom Lane wrote: Alexander Lakhin writes: Now that ExecQueryUsingCursor() is gone, it's not clear, what does the following comment mean:? * We must turn off gexec_flag to avoid infinite recursion. Note that * this allows ExecQueryUsingCursor to be applied to the individual query * results. Hmm, the point about recursion is still valid isn't it? I agree the reference to ExecQueryUsingCursor is obsolete, but I think we need to reconstruct what this comment is actually talking about. It's certainly pretty obscure ... Sorry, I wasn't clear enough, I meant to remove only that reference, not the quoted comment altogether. Best regards, Alexander
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Hello Daniel and Tom, 08.04.2024 17:25, Daniel Verite wrote: So I whacked the patch around till I liked it better, and pushed it. Thanks for taking care of this! Now that ExecQueryUsingCursor() is gone, it's not clear, what does the following comment mean:? * We must turn off gexec_flag to avoid infinite recursion. Note that * this allows ExecQueryUsingCursor to be applied to the individual query * results. Shouldn't it be removed? Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hi Tender Wang, 08.04.2024 13:43, Tender Wang wrote: Hi all, I went through the MERGE/SPLIT partition codes today, thanks for the works. I found some grammar errors: i. in error messages(Users can see this grammar errors, not friendly). ii. in codes comments On a quick glance, I saw also: NULL-value partitionde splited temparary And a trailing whitespace at: the quarter partition back to monthly partitions: warning: 1 line adds whitespace errors. I'm also confused by "administrators" here: https://www.postgresql.org/docs/devel/ddl-partitioning.html (We can find on the same page, for instance: ... whereas table inheritance allows data to be divided in a manner of the user's choosing. It seems to me, that "users" should work for merging partitions as well.) Though the documentation addition requires more than just a quick glance, of course. Best regards, Alexander
Re: Speed up clean meson builds by ~25%
Hello Jelte, 08.04.2024 11:36, Jelte Fennema-Nio wrote: On Mon, 8 Apr 2024 at 10:00, Alexander Lakhin wrote: As I wrote in [1], I didn't observe the issue with clang-18, so maybe it is fixed already. Perhaps it's worth rechecking... Using the attached script I got these timings. Clang is significantly slower in all of them. But especially with -Og the difference between is huge. gcc 11.4.0: 7.276s clang 18.1.3: 17.216s gcc 11.4.0 --debug: 7.441s clang 18.1.3 --debug: 18.164s gcc 11.4.0 --debug -Og: 2.418s clang 18.1.3 --debug -Og: 14.864s I reported this same issue to the LLVM project here: https://github.com/llvm/llvm-project/issues/87973 Maybe we're talking about different problems. At [1] Thomas (and then I) was unhappy with more than 200 seconds duration... https://www.postgresql.org/message-id/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com Best regards, Alexander
Re: Speed up clean meson builds by ~25%
Hello Michael, 08.04.2024 08:23, Michael Paquier wrote: On Fri, Apr 05, 2024 at 06:19:20PM +0200, Jelte Fennema-Nio wrote: Agreed. While not a full solution, I think this patch is still good to address some of the pain: Waiting 10 seconds at the end of my build with only 1 of my 10 cores doing anything. So while this doesn't decrease CPU time spent it does decrease wall-clock time significantly in some cases, with afaict no downsides. Well, this is also painful with ./configure. So, even if we are going to move away from it at this point, we still need to support it for a couple of years. It looks to me that letting the clang folks know about the situation is the best way forward. As I wrote in [1], I didn't observe the issue with clang-18, so maybe it is fixed already. Perhaps it's worth rechecking... [1] https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
08.04.2024 01:15, Alexander Korotkov wrote: Thank you for spotting this. This seems like a missing check. I'm going to get a closer look at this tomorrow. Thanks! There is also an anomaly with the MERGE command: CREATE TABLE t1 (i int, a int, b int, c int) PARTITION BY RANGE (a, b); CREATE TABLE t1p1 PARTITION OF t1 FOR VALUES FROM (1, 1) TO (1, 2); CREATE TABLE t2 (i int, t text) PARTITION BY RANGE (t); CREATE TABLE t2pa PARTITION OF t2 FOR VALUES FROM ('A') TO ('C'); CREATE TABLE t3 (i int, t text); ALTER TABLE t2 MERGE PARTITIONS (t1p1, t2pa, t3) INTO t2p; leads to: ERROR: partition bound for relation "t3" is null WARNING: problem in alloc set PortalContext: detected write past chunk end in block 0x55f1ef42f820, chunk 0x55f1ef42ff40 WARNING: problem in alloc set PortalContext: detected write past chunk end in block 0x55f1ef42f820, chunk 0x55f1ef42ff40 (I'm also not sure that the error message is clear enough (can't we say "relation X is not a partition of relation Y" in this context, as in MarkInheritDetached(), for example?).) Whilst with ALTER TABLE t2 MERGE PARTITIONS (t1p1, t2pa) INTO t2p; I get: Program terminated with signal SIGSEGV, Segmentation fault. #0 pg_detoast_datum_packed (datum=0x1) at fmgr.c:1866 1866 if (VARATT_IS_COMPRESSED(datum) || VARATT_IS_EXTERNAL(datum)) (gdb) bt #0 pg_detoast_datum_packed (datum=0x1) at fmgr.c:1866 #1 0x55d77d00fde2 in bttextcmp (...) at ../../../../src/include/postgres.h:314 #2 0x55d77d03fa27 in FunctionCall2Coll (...) at fmgr.c:1161 #3 0x55d77ce1572f in partition_rbound_cmp (...) at partbounds.c:3525 #4 0x55d77ce157b9 in qsort_partition_rbound_cmp (...) at partbounds.c:3816 #5 0x55d77d0982ef in qsort_arg (...) at ../../src/include/lib/sort_template.h:316 #6 0x55d77ce1d109 in calculate_partition_bound_for_merge (...) at partbounds.c:5786 #7 0x55d77cc24b2b in transformPartitionCmdForMerge (...) at parse_utilcmd.c:3524 #8 0x55d77cc2b555 in transformAlterTableStmt (...) at parse_utilcmd.c:3812 #9 0x55d77ccab17c in ATParseTransformCmd (...) at tablecmds.c:5650 #10 0x55d77ccafd09 in ATExecCmd (...) at tablecmds.c:5589 ... Best regards, Alexander
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hi Alexander and Dmitry, 07.04.2024 01:22, Alexander Korotkov wrote: I've pushed 0001 and 0002. I didn't push 0003 for the following reasons. Please try the following (erroneous) query: CREATE TABLE t1(i int, t text) PARTITION BY LIST (t); CREATE TABLE t1pa PARTITION OF t1 FOR VALUES IN ('A'); CREATE TABLE t2 (i int, t text) PARTITION BY RANGE (t); ALTER TABLE t2 SPLIT PARTITION t1pa INTO (PARTITION t2a FOR VALUES FROM ('A') TO ('B'), PARTITION t2b FOR VALUES FROM ('B') TO ('C')); that triggers an assertion failure: TRAP: failed Assert("datums != NIL"), File: "partbounds.c", Line: 3434, PID: 1841459 or a segfault (in a non-assert build): Program terminated with signal SIGSEGV, Segmentation fault. #0 pg_detoast_datum_packed (datum=0x0) at fmgr.c:1866 1866 if (VARATT_IS_COMPRESSED(datum) || VARATT_IS_EXTERNAL(datum)) (gdb) bt #0 pg_detoast_datum_packed (datum=0x0) at fmgr.c:1866 #1 0x55f38c5d5e3f in bttextcmp (...) at varlena.c:1834 #2 0x55f38c6030dd in FunctionCall2Coll (...) at fmgr.c:1161 #3 0x55f38c417c83 in partition_rbound_cmp (...) at partbounds.c:3525 #4 check_partition_bounds_for_split_range (...) at partbounds.c:5221 #5 check_partitions_for_split (...) at partbounds.c:5688 #6 0x55f38c256c49 in transformPartitionCmdForSplit (...) at parse_utilcmd.c:3451 #7 transformAlterTableStmt (...) at parse_utilcmd.c:3810 #8 0x55f38c2bdf9c in ATParseTransformCmd (...) at tablecmds.c:5650 ... Best regards, Alexander
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Hello Peter, 03.04.2024 22:53, Peter Geoghegan wrote: On Mon, Apr 1, 2024 at 6:33 PM Peter Geoghegan wrote: Note: v18 doesn't have any adjustments to the costing, as originally planned. I'll probably need to post a revised patch with improved (or at least polished) costing in the next few days, so that others will have the opportunity to comment before I commit the patch. Attached is v19, which dealt with remaining concerns I had about the costing in selfuncs.c. My current plan is to commit this on Saturday morning (US Eastern time). Please look at an assertion failure (reproduced starting from 5bf748b86), triggered by the following query: CREATE TABLE t (a int, b int); CREATE INDEX t_idx ON t (a, b); INSERT INTO t (a, b) SELECT g, g FROM generate_series(0, 999) g; ANALYZE t; SELECT * FROM t WHERE a < ANY (ARRAY[1]) AND b < ANY (ARRAY[1]); TRAP: failed Assert("so->numArrayKeys"), File: "nbtutils.c", Line: 560, PID: 3251267 Best regards, Alexander
Re: remaining sql/json patches
05.04.2024 10:09, Amit Langote wrote: Seems like it might be a pre-existing issue, because I can also reproduce the crash with: SELECT * FROM COALESCE(row(1)) AS (a int, b int); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> Backtrace: #0 __pthread_kill_implementation (threadid=281472845250592, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x806c4334 in __pthread_kill_internal (signo=6, threadid=) at pthread_kill.c:78 #2 0x8067c73c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x80669034 in __GI_abort () at abort.c:79 #4 0x00ad9d4c in ExceptionalCondition (conditionName=0xcbb368 "!(tupdesc->natts >= colcount)", errorType=0xcbb278 "FailedAssertion", fileName=0xcbb2c8 "nodeFunctionscan.c", lineNumber=379) at assert.c:54 That's strange, because I get the error (on master, 6f132ed69). With backtrace_functions = 'tupledesc_match', I see 2024-04-05 10:48:27.827 MSK client backend[2898632] regress ERROR: function return row and query-specified return row do not match 2024-04-05 10:48:27.827 MSK client backend[2898632] regress DETAIL: Returned row contains 1 attribute, but query expects 2. 2024-04-05 10:48:27.827 MSK client backend[2898632] regress BACKTRACE: tupledesc_match at execSRF.c:948:3 ExecMakeTableFunctionResult at execSRF.c:427:13 FunctionNext at nodeFunctionscan.c:94:5 ExecScanFetch at execScan.c:131:10 ExecScan at execScan.c:180:10 ExecFunctionScan at nodeFunctionscan.c:272:1 ExecProcNodeFirst at execProcnode.c:465:1 ExecProcNode at executor.h:274:9 (inlined by) ExecutePlan at execMain.c:1646:10 standard_ExecutorRun at execMain.c:363:3 ExecutorRun at execMain.c:305:1 PortalRunSelect at pquery.c:926:26 PortalRun at pquery.c:775:8 exec_simple_query at postgres.c:1282:3 PostgresMain at postgres.c:4684:27 BackendMain at backend_startup.c:57:2 pgarch_die at pgarch.c:847:1 BackendStartup at postmaster.c:3593:8 ServerLoop at postmaster.c:1674:6 main at main.c:184:3 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f37127f0e40] 2024-04-05 10:48:27.827 MSK client backend[2898632] regress STATEMENT: SELECT * FROM COALESCE(row(1)) AS (a int, b int); That's why I had attributed the failure to JSON_TABLE(). Though SELECT * FROM generate_series(1, 1), COALESCE(row(1)) AS (a int, b int); really triggers the assert too. Sorry for the noise... Best regards, Alexander
Re: remaining sql/json patches
Hello Amit, 04.04.2024 15:02, Amit Langote wrote: Pushed after fixing these and a few other issues. I didn't include the testing function you proposed in your other email. It sounds useful for testing locally but will need some work before we can include it in the tree. I'll post the rebased 0002 tomorrow after addressing your comments. Please look at an assertion failure: TRAP: failed Assert("count <= tupdesc->natts"), File: "parse_relation.c", Line: 3048, PID: 1325146 triggered by the following query: SELECT * FROM JSON_TABLE('0', '$' COLUMNS (js int PATH '$')), COALESCE(row(1)) AS (a int, b int); Without JSON_TABLE() I get: ERROR: function return row and query-specified return row do not match DETAIL: Returned row contains 1 attribute, but query expects 2. Best regards, Alexander
Re: Recent 027_streaming_regress.pl hangs
Hello Andres, 26.03.2024 10:59, Andres Freund wrote: Late, will try to look more in the next few days. AFAICS, last 027_streaming_regress.pl failures on calliphoridae, culicidae, tamandua occurred before 2024-03-27: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-03-26%2004%3A07%3A30 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2024-03-22%2013%3A26%3A21 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua=2024-03-24%2007%3A44%3A27 So it looks like the issue resolved, but there is another apparently performance-related issue: deadlock-parallel test failures. A recent one: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=canebrake=2024-04-02%2022%3A20%3A22 test deadlock-parallel ... FAILED 345099 ms +isolationtester: canceling step d2a1 after 300 seconds step d2a1: <... completed> - sum -- -1 -(1 row) - ... The server log shows: 2024-04-02 23:56:45.353 UTC [3583878][client backend][5/530:0] LOG: statement: SET force_parallel_mode = on; ... SELECT lock_share(3,x) FROM bigt LIMIT 1; 2024-04-02 23:56:45.364 UTC [3583876][client backend][3/2732:0] LOG: execute isolationtester_waiting: SELECT pg_catalog.pg_isolation_test_session_is_blocked($1, '{3583877,3583878,3583879,3583880}') 2024-04-02 23:56:45.364 UTC [3583876][client backend][3/2732:0] DETAIL: parameters: $1 = '3583878' ... 2024-04-02 23:57:28.967 UTC [3583876][client backend][3/5097:0] LOG: execute isolationtester_waiting: SELECT pg_catalog.pg_isolation_test_session_is_blocked($1, '{3583877,3583878,3583879,3583880}') 2024-04-02 23:57:28.967 UTC [3583876][client backend][3/5097:0] DETAIL: parameters: $1 = '3583877' 2024-04-02 23:57:29.016 UTC [3583877][client backend][4/530:0] LOG: statement: COMMIT; 2024-04-02 23:57:29.039 UTC [3583876][client backend][3/5098:0] LOG: execute isolationtester_waiting: SELECT pg_catalog.pg_isolation_test_session_is_blocked($1, '{3583877,3583878,3583879,3583880}') 2024-04-02 23:57:29.039 UTC [3583876][client backend][3/5098:0] DETAIL: parameters: $1 = '3583879' ... 2024-04-03 00:02:29.096 UTC [3583876][client backend][3/9472:0] LOG: execute isolationtester_waiting: SELECT pg_catalog.pg_isolation_test_session_is_blocked($1, '{3583877,3583878,3583879,3583880}') 2024-04-03 00:02:29.096 UTC [3583876][client backend][3/9472:0] DETAIL: parameters: $1 = '3583878' 2024-04-03 00:02:29.172 UTC [3905345][not initialized][:0] LOG: connection received: host=[local] 2024-04-03 00:02:29.240 UTC [3583878][client backend][5/530:0] ERROR: canceling statement due to user request The last step duration is 00:02:29.096 - 23:57:29.039 ~ 300 seconds (default max_step_wait for REL_15_STABLE- (for REL_16_STABLE+ the default value was increased to 360 by c99c67fc4)). The average deadlock-parallel duration for REL_15_STABLE on canebrake is around 128 seconds (for 140 runs I analyzed), but we can find also: test deadlock-parallel ... ok 377895 ms https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=canebrake=2024-03-27%2001%3A06%3A24=isolation-check test deadlock-parallel ... ok 302549 ms https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=canebrake=2023-11-06%2012%3A47%3A01=isolation-check test deadlock-parallel ... ok 255045 ms https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=canebrake=2023-11-09%2010%3A02%3A59=isolation-check The similar situation on phycodurus: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurusdt=2024-02-11%2021:05:41 test deadlock-parallel ... FAILED 389381 ms The average deadlock-parallel duration for REL_13_STABLE on phycodurus is around 78 seconds (for 138 recent runs), but there were also: test deadlock-parallel ... ok 441736 ms https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=phycodurus=2024-03-04%2015%3A19%3A04=isolation-check test deadlock-parallel ... ok 187844 ms https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=phycodurus=2023-11-03%2016%3A13%3A46=isolation-check And also pogona, REL_14_STABLE: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona=2024-02-20%2003%3A50%3A49 test deadlock-parallel ... FAILED 425482 ms (I could reach similar duration on a slowed-down VM, with JIT enabled as on these animals.) So, maybe these machines require larger PGISOLATIONTIMEOUT or there is still some OS/environment issue there? Best regards, Alexander
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Hello Alvaro, 27.02.2024 20:33, Alvaro Herrera wrote: Here's the complete set, with these two names using the singular. I've managed to trigger an assert added by 53c2a97a9. Please try the following script against a server compiled with -DTEST_SUMMARIZE_SERIAL (initially I observed this failure without the define, it just simplifies reproducing...): # initdb & start ... createdb test echo " SELECT pg_current_xact_id() AS tx \gset SELECT format('CREATE TABLE t%s(i int)', g) FROM generate_series(1, 1022 - :tx) g \gexec BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT pg_current_xact_id(); SELECT pg_sleep(5); " | psql test & echo " SELECT pg_sleep(1); BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT 1 INTO a; COMMIT; BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT 2 INTO b; " | psql test It fails for me with the following stack trace: TRAP: failed Assert("LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE)"), File: "slru.c", Line: 366, PID: 21711 ExceptionalCondition at assert.c:52:13 SimpleLruZeroPage at slru.c:369:11 SerialAdd at predicate.c:921:20 SummarizeOldestCommittedSxact at predicate.c:1521:2 GetSerializableTransactionSnapshotInt at predicate.c:1787:16 GetSerializableTransactionSnapshot at predicate.c:1691:1 GetTransactionSnapshot at snapmgr.c:264:21 exec_simple_query at postgres.c:1162:4 ... Best regards, Alexander
Re: ALTER TABLE SET ACCESS METHOD on partitioned tables
Hello Alvaro, 28.03.2024 18:58, Alvaro Herrera wrote: Grumble. I don't like initialization at declare time, so far from the code that depends on the value. But the alternative would have been to assign right where this blocks starts, an additional line. I pushed it like you had it. I've stumbled upon a test failure caused by the test query added in that commit: --- .../src/test/regress/expected/create_am.out 2024-03-28 12:14:11.700764888 -0400 +++ .../src/test/recovery/tmp_check/results/create_am.out 2024-03-31 03:10:28.172244122 -0400 @@ -549,7 +549,10 @@ ERROR: access method "btree" is not of type TABLE -- Other weird invalid cases that cause problems CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x; -ERROR: "pg_am" is not partitioned +ERROR: deadlock detected +DETAIL: Process 3076180 waits for AccessShareLock on relation 1259 of database 16386; blocked by process 3076181. +Process 3076181 waits for AccessShareLock on relation 2601 of database 16386; blocked by process 3076180. +HINT: See server log for query details. -- Drop table access method, which fails as objects depends on it DROP ACCESS METHOD heap2; ERROR: cannot drop access method heap2 because other objects depend on it 027_stream_regress_primary.log contains: 2024-03-31 03:10:26.728 EDT [3076181] pg_regress/vacuum LOG: statement: VACUUM FULL pg_class; ... 2024-03-31 03:10:26.797 EDT [3076180] pg_regress/create_am LOG: statement: CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x; ... 2024-03-31 03:10:28.183 EDT [3076181] pg_regress/vacuum LOG: statement: VACUUM FULL pg_database; This simple demo confirms the issue: for ((i=1;i<=20;i++)); do echo "iteration $i" echo "VACUUM FULL pg_class;" | psql >psql-1.log & echo "CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;" | psql >psql-2.log & wait done ... iteration 15 ERROR: "pg_am" is not partitioned iteration 16 ERROR: deadlock detected DETAIL: Process 2556377 waits for AccessShareLock on relation 1259 of database 16384; blocked by process 2556378. Process 2556378 waits for AccessShareLock on relation 2601 of database 16384; blocked by process 2556377. HINT: See server log for query details. ... Best regards, Alexander
Re: To what extent should tests rely on VACUUM ANALYZE?
Hello Tom, 29.03.2024 16:51, Tom Lane wrote: Alexander Lakhin writes: I think that deviation can be explained by the fact that cost_index() takes baserel->allvisfrac (derived from pg_class.relallvisible) into account for the index-only-scan case, and I see the following difference when a test run fails: relname | relpages | reltuples | relallvisible | indisvalid | autovacuum_count | autoanalyze_count --+--+---+---++--+--- - tenk1 | 345 | 1 | 345 | | 0 | 0 + tenk1 | 345 | 1 | 305 | | 0 | 0 Ouch. So what's triggering that? The intention of test_setup surely is to provide a uniform starting point. Thanks for your attention to the issue! Please try the attached... Best regards, Alexanderdiff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index f0f8d4259c..d60c117378 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -5041,6 +5041,7 @@ ConditionalLockBufferForCleanup(Buffer buffer) Assert(BufferIsValid(buffer)); +if (rand() % 10 == 0) return false; if (BufferIsLocal(buffer)) { refcount = LocalRefCount[-buffer - 1]; diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 70ab47a92f..93c1f4f08d 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -1932,6 +1932,32 @@ ORDER BY unique1; 42 (3 rows) +explain (verbose) +SELECT thousand, tenthous FROM tenk1 +WHERE thousand < 2 AND tenthous IN (1001,3000) +ORDER BY thousand; + QUERY PLAN +-- + Index Only Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..4.71 rows=1 width=8) + Output: thousand, tenthous + Index Cond: (tenk1.thousand < 2) + Filter: (tenk1.tenthous = ANY ('{1001,3000}'::integer[])) +(4 rows) + +select c.relname,c.relpages,c.reltuples,c.relallvisible,i.indisvalid,s.autovacuum_count,s.autoanalyze_count +from pg_class c +left join pg_stat_all_tables s on c.oid = s.relid +left join pg_index i on c.oid = i.indexrelid +where c.relname like 'tenk1%' order by c.relname; + relname| relpages | reltuples | relallvisible | indisvalid | autovacuum_count | autoanalyze_count +--+--+---+---++--+--- + tenk1| 345 | 1 | 345 ||0 | 0 + tenk1_hundred| 11 | 1 | 0 | t | | + tenk1_thous_tenthous | 30 | 1 | 0 | t | | + tenk1_unique1| 30 | 1 | 0 | t | | + tenk1_unique2| 30 | 1 | 0 | t | | +(5 rows) + explain (costs off) SELECT thousand, tenthous FROM tenk1 WHERE thousand < 2 AND tenthous IN (1001,3000) diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql index d49ce9f300..732b54fb35 100644 --- a/src/test/regress/sql/create_index.sql +++ b/src/test/regress/sql/create_index.sql @@ -765,6 +765,17 @@ SELECT unique1 FROM tenk1 WHERE unique1 IN (1,42,7) ORDER BY unique1; +explain (verbose) +SELECT thousand, tenthous FROM tenk1 +WHERE thousand < 2 AND tenthous IN (1001,3000) +ORDER BY thousand; + +select c.relname,c.relpages,c.reltuples,c.relallvisible,i.indisvalid,s.autovacuum_count,s.autoanalyze_count +from pg_class c +left join pg_stat_all_tables s on c.oid = s.relid +left join pg_index i on c.oid = i.indexrelid +where c.relname like 'tenk1%' order by c.relname; + explain (costs off) SELECT thousand, tenthous FROM tenk1 WHERE thousand < 2 AND tenthous IN (1001,3000)
Re: To what extent should tests rely on VACUUM ANALYZE?
29.03.2024 11:59, Alexander Lakhin wrote: But it looks like subselect is not the only test that can fail due to vacuum instability. I see that create_index also suffers from cranky ConditionalLockBufferForCleanup() (+if (rand() % 10 == 0) return false; ), although it placed in parallel_schedule before sanity_check, so this failure needs another explanation: - QUERY PLAN - Index Only Scan using tenk1_thous_tenthous on tenk1 - Index Cond: (thousand < 2) - Filter: (tenthous = ANY ('{1001,3000}'::integer[])) -(3 rows) + QUERY PLAN +-- + Sort + Sort Key: thousand + -> Index Only Scan using tenk1_thous_tenthous on tenk1 + Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[]))) +(4 rows) I think that deviation can be explained by the fact that cost_index() takes baserel->allvisfrac (derived from pg_class.relallvisible) into account for the index-only-scan case, and I see the following difference when a test run fails: relname | relpages | reltuples | relallvisible | indisvalid | autovacuum_count | autoanalyze_count --+--+---+---++--+--- - tenk1 | 345 | 1 | 345 | | 0 | 0 + tenk1 | 345 | 1 | 305 | | 0 | 0 Best regards, Alexander
Re: To what extent should tests rely on VACUUM ANALYZE?
29.03.2024 11:59, Alexander Lakhin wrote: This simple change fixes the issue for me: -VACUUM ANALYZE tenk2; +VACUUM (ANALYZE, DISABLE_PAGE_SKIPPING) tenk2; I'm sorry, I wasn't persevering enough when testing that... After more test runs, I see that in fact it doesn't help. Best regards, Alexander
Re: To what extent should tests rely on VACUUM ANALYZE?
28.03.2024 20:33, Tom Lane wrote: But I'm a bit confused - how come the estimates do change at all? The analyze simply fetches 30k rows, and tenk only has 10k of them. So we should have *exact* numbers, and it should be exactly the same for all the analyze runs. So how come it changes like this? It's plausible that the VACUUM ANALYZE done by test_setup fails ConditionalLockBufferForCleanup() sometimes because of concurrent activity like checkpointer writes. I'm not quite sure how we get from that to the observed symptom though. Maybe the VACUUM needs DISABLE_PAGE_SKIPPING? Yeah, the way from ConditionalLockBufferForCleanup() returning false to reltuples < 1 is not one-step, as I thought initially. There is also sanity_check doing VACUUM in between. So, effectively the troublesome scenario is: VACUUM ANALYZE tenk2; -- with cleanup lock not granted for some blocks VACUUM tenk2; In this scenario, lazy_scan_heap() -> vac_estimate_reltuples() called two times. First, with rel_pages: 384, vacrel->scanned_pages: 384, vacrel->live_tuples: 1 and it results in vacrel->new_live_tuples = 1, And second, with rel_pages: 345, vacrel->scanned_pages: 80, vacrel->live_tuples: 2315 (for instance), and we get vacrel->new_live_tuples = 9996, With unmodified ConditionalLockBufferForCleanup() the second call is performed with rel_pages: 345, vacrel->scanned_pages: 1, vacrel->live_tuples: 24 and it returns 1. This simple change fixes the issue for me: -VACUUM ANALYZE tenk2; +VACUUM (ANALYZE, DISABLE_PAGE_SKIPPING) tenk2; But it looks like subselect is not the only test that can fail due to vacuum instability. I see that create_index also suffers from cranky ConditionalLockBufferForCleanup() (+if (rand() % 10 == 0) return false; ), although it placed in parallel_schedule before sanity_check, so this failure needs another explanation: - QUERY PLAN - Index Only Scan using tenk1_thous_tenthous on tenk1 - Index Cond: (thousand < 2) - Filter: (tenthous = ANY ('{1001,3000}'::integer[])) -(3 rows) + QUERY PLAN +-- + Sort + Sort Key: thousand + -> Index Only Scan using tenk1_thous_tenthous on tenk1 + Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[]))) +(4 rows) Best regards, Alexander
To what extent should tests rely on VACUUM ANALYZE?
Hello hackers, When running multiple 027_stream_regress.pl test instances in parallel (and with aggressive autovacuum) on a rather slow machine, I encountered test failures due to the subselect test instability just as the following failures on buildfarm: 1) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit=2024-03-27%2010%3A16%3A12 --- /home/bf/bf-build/grassquit/HEAD/pgsql/src/test/regress/expected/subselect.out 2024-03-19 22:20:34.435867114 + +++ /home/bf/bf-build/grassquit/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/subselect.out 2024-03-27 10:28:38.185776605 + @@ -2067,16 +2067,16 @@ QUERY PLAN - Hash Join - Hash Cond: (c.odd = b.odd) + Hash Cond: (c.hundred = a.hundred) -> Hash Join - Hash Cond: (a.hundred = c.hundred) - -> Seq Scan on tenk1 a + Hash Cond: (b.odd = c.odd) + -> Seq Scan on tenk2 b -> Hash -> HashAggregate Group Key: c.odd, c.hundred -> Seq Scan on tenk2 c -> Hash - -> Seq Scan on tenk2 b + -> Seq Scan on tenk1 a (11 rows) 2) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=2024-03-27%2009%3A49%3A38 (That query was added recently (by 9f1337639 from 2023-02-15) and the failure evidentially depends on timing, so the number of the failures I could find on buildfarm is moderate for now.) With the subselect test modified as in attached, I could see what makes the plan change: - -> Seq Scan on public.tenk2 c (cost=0.00..445.00 rows=1 width=8) + -> Seq Scan on public.tenk2 c (cost=0.00..444.95 rows=9995 width=8) relname | relpages | reltuples | autovacuum_count | autoanalyze_count -+--+---+--+--- - tenk2 | 345 | 1 | 0 | 0 + tenk2 | 345 | 9995 | 0 | 0 Using the trick Thomas proposed in [1] (see my modification attached), I could reproduce the failure easily on my workstation with no specific conditions: 2024-03-28 14:05:13.792 UTC client backend[2358012] pg_regress/test_setup LOG: !!!ConditionalLockBufferForCleanup() returning false 2024-03-28 14:05:13.792 UTC client backend[2358012] pg_regress/test_setup CONTEXT: while scanning block 29 of relation "public.tenk2" 2024-03-28 14:05:13.792 UTC client backend[2358012] pg_regress/test_setup STATEMENT: VACUUM ANALYZE tenk2; ... relname | relpages | reltuples | autovacuum_count | autoanalyze_count -+--+---+--+--- - tenk2 | 345 | 1 | 0 | 0 + tenk2 | 345 | 9996 | 0 | 0 (1 row) So it looks to me like a possible cause of the failure, and I wonder whether checks for query plans should be immune to such changes or results of VACUUM ANALYZE should be 100% stable? [1] https://www.postgresql.org/message-id/CA%2BhUKGKYNHmL_DhmVRiidHv6YLAL8jViifwwn2ABY__Y3BCphg%40mail.gmail.com Best regards, Alexanderdiff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 29b11f11aa..6a7bb6b7a9 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -2079,6 +2079,32 @@ ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd); -> Seq Scan on tenk2 b (11 rows) +explain (verbose) +SELECT * FROM tenk1 A INNER JOIN tenk2 B +ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd); + QUERY PLAN + + Hash Join (cost=1087.50..13845.00 rows=100 width=488) + Output: a.unique1, a.unique2, a.two, a.four, a.ten, a.twenty, a.hundred, a.thousand, a.twothousand, a.fivethous, a.tenthous, a.odd, a.even, a.stringu1, a.stringu2, a.string4, b.unique1, b.unique2, b.two, b.four, b.ten, b.twenty, b.hundred, b.thousand, b.twothousand, b.fivethous, b.tenthous, b.odd, b.even, b.stringu1, b.stringu2, b.string4 + Hash Cond: (c.odd = b.odd) + -> Hash Join (cost=517.50..2000.00 rows=1 width=248) + Output: a.unique1, a.unique2, a.two, a.four, a.ten, a.twenty,
Re: Properly pathify the union planner
Hello David, 25.03.2024 04:43, David Rowley wrote: I didn't see that as a reason not to push this patch as this occurs both with and without this change, so I've now pushed this patch. Please look at a new assertion failure, that is triggered by the following query: SELECT count(*) FROM ( WITH q1(x) AS (SELECT 1) SELECT FROM q1 UNION SELECT FROM q1 ) qu; TRAP: failed Assert("lg != NULL"), File: "planner.c", Line: 7941, PID: 1133017 Best regards, Alexander
Re: Parallel Aggregates for string_agg and array_agg
Hello David, 23.01.2023 07:37, David Rowley wrote: I've now pushed this. I've discovered that the test query: -- Ensure parallel aggregation is actually being used. explain (costs off) select * from v_pagg_test order by y; added by 16fd03e95 fails sometimes. For instance: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=urutu=2024-03-19%2021%3A04%3A05 --- /home/bf/bf-build/urutu/HEAD/pgsql/src/test/regress/expected/aggregates.out 2024-02-24 06:42:47.039073180 + +++ /home/bf/bf-build/urutu/HEAD/pgsql.build/src/test/regress/results/aggregates.out 2024-03-19 22:24:18.155876135 + @@ -1993,14 +1993,16 @@ Sort Key: pagg_test.y, (((unnest(regexp_split_to_array((string_agg((pagg_test.x)::text, ','::text)), ','::text::integer) -> Result -> ProjectSet - -> Finalize HashAggregate + -> Finalize GroupAggregate ... https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus=2024-02-28%2007%3A38%3A27 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=avocet=2024-02-08%2008%3A47%3A45 I suspect that these failures caused by autovacuum. I could reproduce this plan change when running multiple tests in parallel, and also with the attached test patch applied (several sleeps are needed for autovacuum/relation_needs_vacanalyze() to get a non-zero mod_since_analyze value from pgstat): TEMP_CONFIG=/tmp/temp.config TESTS="test_setup create_index create_aggregate aggregates" make -s check-tests where /tmp/temp.config is: autovacuum_naptime = 1 log_autovacuum_min_duration = 0 log_min_messages = INFO log_min_error_statement = log log_statement = 'all' With EXPLAIN (VERBOSE), I see a slight change of the Seq Scan cost/rows estimate: ... -> Parallel Seq Scan on public.pagg_test (cost=0.00..48.99 rows=2599 width=8) vs .. -> Parallel Seq Scan on public.pagg_test (cost=0.00..48.00 rows=2500 width=8) (after automatic analyze of table "regression.public.pagg_test") Best regards, Alexanderdiff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out index d54a255e58..3de98c916d 100644 --- a/src/test/regress/expected/aggregates.out +++ b/src/test/regress/expected/aggregates.out @@ -1931,6 +1931,12 @@ select string_agg(v, decode('ee', 'hex')) from bytea_test_table; drop table bytea_test_table; -- Test parallel string_agg and array_agg create table pagg_test (x int, y int); +select pg_sleep(3); + pg_sleep +-- + +(1 row) + insert into pagg_test select (case x % 4 when 1 then null else x end), x % 10 from generate_series(1,5000) x; @@ -1967,6 +1973,12 @@ from ( ) a1 ) a2 group by y; +select pg_sleep(3); + pg_sleep +-- + +(1 row) + -- Ensure results are correct. select * from v_pagg_test order by y; y | tmin | tmax | tndistinct | bmin | bmax | bndistinct | amin | amax | andistinct | aamin | aamax | aandistinct @@ -1983,6 +1995,12 @@ select * from v_pagg_test order by y; 9 | 19 | 4999 |250 | 1019 | 999 |250 | 19 | 4999 |250 |19 | 4999 | 250 (10 rows) +select pg_sleep(3); + pg_sleep +-- + +(1 row) + -- Ensure parallel aggregation is actually being used. explain (costs off) select * from v_pagg_test order by y; QUERY PLAN diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql index 441e01d150..25dd90ec65 100644 --- a/src/test/regress/sql/aggregates.sql +++ b/src/test/regress/sql/aggregates.sql @@ -750,6 +750,7 @@ drop table bytea_test_table; -- Test parallel string_agg and array_agg create table pagg_test (x int, y int); +select pg_sleep(3); insert into pagg_test select (case x % 4 when 1 then null else x end), x % 10 from generate_series(1,5000) x; @@ -789,9 +790,11 @@ from ( ) a2 group by y; +select pg_sleep(3); -- Ensure results are correct. select * from v_pagg_test order by y; +select pg_sleep(3); -- Ensure parallel aggregation is actually being used. explain (costs off) select * from v_pagg_test order by y;
Re: ALTER TABLE SET ACCESS METHOD on partitioned tables
Hello Alvaro, 21.03.2024 15:07, Alvaro Herrera wrote: Given that Michaël is temporarily gone, I propose to push the attached tomorrow. Please look at a new anomaly introduced with 374c7a229. Starting from that commit, the following erroneous query: CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x; triggers an assertion failure: TRAP: failed Assert("relation->rd_rel->relam == InvalidOid"), File: "relcache.c", Line: 1219, PID: 3706301 with the stack trace: ... #4 0x7fe53ced67f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x55f28555951e in ExceptionalCondition (conditionName=conditionName@entry=0x55f285744788 "relation->rd_rel->relam == InvalidOid", fileName=fileName@entry=0x55f285743f1c "relcache.c", lineNumber=lineNumber@entry=1219) at assert.c:66 #6 0x55f285550450 in RelationBuildDesc (targetRelId=targetRelId@entry=16385, insertIt=insertIt@entry=false) at relcache.c:1219 #7 0x55f285550769 in RelationClearRelation (relation=relation@entry=0x7fe5310dd178, rebuild=rebuild@entry=true) at relcache.c:2667 #8 0x55f285550c41 in RelationFlushRelation (relation=0x7fe5310dd178) at relcache.c:2850 #9 0x55f285550ca0 in RelationCacheInvalidateEntry (relationId=) at relcache.c:2921 #10 0x55f285542551 in LocalExecuteInvalidationMessage (msg=0x55f2861b3160) at inval.c:738 #11 0x55f28554159b in ProcessInvalidationMessages (group=0x55f2861b2e6c, func=func@entry=0x55f2855424a8 ) at inval.c:518 #12 0x55f285542740 in CommandEndInvalidationMessages () at inval.c:1180 #13 0x55f28509cbbd in AtCCI_LocalCache () at xact.c:1550 #14 0x55f28509e88e in CommandCounterIncrement () at xact.c:1116 #15 0x55f2851d0c8b in DefineRelation (stmt=stmt@entry=0x55f2861803b0, relkind=relkind@entry=102 'f', ownerId=10, ownerId@entry=0, typaddress=typaddress@entry=0x0, queryString=queryString@entry=0x55f28617f870 "CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;") at tablecmds.c:1008 #16 0x55f28540945d in ProcessUtilitySlow (pstate=pstate@entry=0x55f2861a9dc0, pstmt=pstmt@entry=0x55f286180510, queryString=queryString@entry=0x55f28617f870 "CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;", context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=0x55f2861807d0, qc=0x7fff15b5d7c0) at utility.c:1203 #17 0x55f28540911f in standard_ProcessUtility (pstmt=0x55f286180510, queryString=0x55f28617f870 "CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;", readOnlyTree=, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55f2861807d0, qc=0x7fff15b5d7c0) at utility.c:1067 ... On 374c7a229~1 it fails with ERROR: "pg_am" is not partitioned Best regards, Alexander
Re: Built-in CTYPE provider
Hello Jeff, 21.03.2024 03:13, Jeff Davis wrote: On Tue, 2024-03-19 at 13:41 +0100, Peter Eisentraut wrote: * v25-0002-Support-C.UTF-8-locale-in-the-new-builtin-collat.patch Looks ok. Committed. Please look at a Valgrind-detected error caused by the following query (starting from f69319f2f): SELECT lower('Π' COLLATE pg_c_utf8); ==00:00:00:03.487 1429669== Invalid read of size 1 ==00:00:00:03.487 1429669== at 0x7C64A5: convert_case (unicode_case.c:107) ==00:00:00:03.487 1429669== by 0x7C: unicode_strlower (unicode_case.c:70) ==00:00:00:03.487 1429669== by 0x66B218: str_tolower (formatting.c:1698) ==00:00:00:03.488 1429669== by 0x6D6C55: lower (oracle_compat.c:55) Best regards, Alexander
Re: Test 031_recovery_conflict.pl is not immune to autovacuum
Hello Melanie, 20.03.2024 16:15, Melanie Plageman wrote: Seems like we could just add autovacuum_enabled=false to the table like this: diff --git a/src/test/recovery/t/031_recovery_conflict.pl b/src/test/recovery/t/031_recovery_conflict.pl index d87efa823fd..65bc858c02d 100644 --- a/src/test/recovery/t/031_recovery_conflict.pl +++ b/src/test/recovery/t/031_recovery_conflict.pl @@ -59,7 +59,7 @@ my $table1 = "test_recovery_conflict_table1"; my $table2 = "test_recovery_conflict_table2"; $node_primary->safe_psql( $test_db, qq[ -CREATE TABLE ${table1}(a int, b int); +CREATE TABLE ${table1}(a int, b int) WITH (autovacuum_enabled = false); INSERT INTO $table1 SELECT i % 3, 0 FROM generate_series(1,20) i; CREATE TABLE ${table2}(a int, b int); ]); Thanks for paying attention to it! With such modification applied I've got another failure (on iteration 2): [13:27:39.034](2.317s) ok 14 - startup deadlock: lock acquisition is waiting Waiting for replication conn standby's replay_lsn to pass 0/343E6D0 on primary done timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at t/031_recovery_conflict.pl line 318. # Postmaster PID for node "primary" is 1523036 ### Stopping node "primary" using mode immediate 031_recovery_conflict_standby.log really doesn't contain the expected message. I can share log files from a successful and failed test runs, if they can be helpful, or I'll investigate this case today/tomorrow. Best regards, Alexander
Test 031_recovery_conflict.pl is not immune to autovacuum
Hello hackers, Among many recoveryCheck (more concretely, 027_stream_regress) failures occurred on a number of buildfarm animals after switching to meson, which can be explained by timeouts, I saw a different failure on adder: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-03-18%2023%3A43%3A00 [23:48:52.521](9.831s) ok 13 - startup deadlock: cursor holding conflicting pin, also waiting for lock, established [23:55:13.749](381.228s) # poll_query_until timed out executing this query: # # SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted; # # expecting this output: # waiting # last actual query output: # # with stderr: [23:55:13.763](0.013s) not ok 14 - startup deadlock: lock acquisition is waiting and I suspect that it might be caused by autovacuum. I've managed to reproduced it locally (running 10 tests in parallel on a 2-core VM with disk bandwidth limited to 80MB/sec I get failures on iterations 10, 1, 3) and observed the following (with wal_debug = on): 031_recovery_conflict_standby.log: 2024-03-20 04:12:06.519 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT a FROM test_recovery_conflict_table1; 2024-03-20 04:12:06.520 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: FETCH FORWARD FROM test_recovery_conflict_cursor; 2024-03-20 04:12:06.520 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: SELECT * FROM test_recovery_conflict_table2; ... 2024-03-20 04:12:07.073 UTC|||65fa620d.111ec8|LOG: REDO @ 0/3438360; LSN 0/3438460: prev 0/3438338; xid 0; len 9; blkref #0: rel 1663/16385/16392, blk 0 - Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101] 2024-03-20 04:12:07.084 UTC|||65fa620d.111ec8|LOG: recovery still waiting after 11.241 ms: recovery conflict on buffer pin 2024-03-20 04:12:07.084 UTC|||65fa620d.111ec8|CONTEXT: WAL redo at 0/3438360 for Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101]; blkref #0: rel 1663/16385/16392, blk 0 2024-03-20 04:12:07.095 UTC|vagrant|test_db|65fa6214.111ede|ERROR: canceling statement due to conflict with recovery at character 15 2024-03-20 04:12:07.095 UTC|vagrant|test_db|65fa6214.111ede|DETAIL: User transaction caused buffer deadlock with recovery. ... 2024-03-20 04:12:08.093 UTC|vagrant|postgres|65fa6216.111f1a|LOG: statement: SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted; 031_recovery_conflict_primary.log: 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|DEBUG: Autovacuum VacuumUpdateCosts(db=16385, rel=16392, dobalance=yes, cost_limit=200, cost_delay=2 active=yes failsafe=no) 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|DEBUG: Autovacuum VacuumUpdateCosts(db=16385, rel=16392, dobalance=yes, cost_limit=200, cost_delay=2 active=yes failsafe=no) 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|LOG: INSERT @ 0/3438460: - Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101] 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|CONTEXT: while scanning block 0 of relation "public.test_recovery_conflict_table1" ... 2024-03-20 04:12:05.981 UTC|||65fa6215.111f02|LOG: automatic vacuum of table "test_db.public.test_recovery_conflict_table1": index scans: 0 The corresponding fragment of 031_recovery_conflict.pl: $res = $psql_standby->query_until( qr/^1$/m, qq[ BEGIN; -- hold pin DECLARE $cursor1 CURSOR FOR SELECT a FROM $table1; FETCH FORWARD FROM $cursor1; -- wait for lock held by prepared transaction SELECT * FROM $table2; ]); ok(1, "$sect: cursor holding conflicting pin, also
Re: Recent 027_streaming_regress.pl hangs
14.03.2024 23:56, Tom Lane wrote: Thomas Munro writes: On Fri, Mar 15, 2024 at 7:00 AM Alexander Lakhin wrote: Could it be that the timeout (360 sec?) is just not enough for the test under the current (changed due to switch to meson) conditions? But you're right that under meson the test takes a lot longer, I guess due to increased concurrency: What it seems to be is highly variable. Looking at calliphoridae's last half dozen successful runs, I see reported times for 027_stream_regress anywhere from 183 to 704 seconds. I wonder what else is happening on that machine. Also, this is probably not helping anything: 'extra_config' => { ... 'fsync = on' I would suggest turning that off and raising wait_timeout a good deal, and then we'll see if calliphoridae gets any more stable. I could reproduce similar failures with PG_TEST_EXTRA=wal_consistency_checking only, running 5 tests in parallel on a slowed-down VM, so that test duration increased to ~1900 seconds, but perhaps that buildfarm machine has a different bottleneck (I/O?) or it's concurrent workload is not uniform as in my experiments. Meanwhile, I've analyzed failed test logs from buildfarm and calculated the percentage of WAL replayed before timeout. For instance, one of the failures: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-03-18%2022%3A36%3A40 standby_1.log: 2024-03-18 22:38:22.743 UTC [2010896][walreceiver][:0] LOG: started streaming WAL from primary at 0/300 on timeline 1 ... 2024-03-18 22:50:02.439 UTC [2004203][checkpointer][:0] LOG: recovery restart point at 0/E00E030 2024-03-18 22:50:02.439 UTC [2004203][checkpointer][:0] DETAIL: Last completed transaction was at log time 2024-03-18 22:41:26.647756+00. 2024-03-18 22:50:12.841 UTC [2010896][walreceiver][:0] FATAL: could not receive data from WAL stream: server closed the connection unexpectedly primary.log: 2024-03-18 22:38:23.754 UTC [2012240][client backend][3/3:0] LOG: statement: GRANT ALL ON SCHEMA public TO public; # ^^^ One of the first records produced by `make check` ... 2024-03-18 22:41:26.647 UTC [2174047][client backend][10/752:0] LOG: statement: ALTER VIEW my_property_secure SET (security_barrier=false); # ^^^ A record near the last completed transaction on standby ... 2024-03-18 22:44:13.226 UTC [2305844][client backend][22/3784:0] LOG: statement: DROP TABLESPACE regress_tblspace_renamed; # ^^^ One of the last records produced by `make check` \set t0 '22:38:23.754' \set t1 '22:44:13.226' \set tf '22:41:26.647756' select extract(epoch from (:'tf'::time - :'t0'::time)) / extract(epoch from (:'t1'::time - :'t0'::time)); ~52% https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-03-18%2018%3A58%3A58 ~48% https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-03-18%2016%3A41%3A13 ~43% https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-03-18%2015%3A47%3A09 ~36% https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-03-15%2011%3A24%3A38 ~87% https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2024-03-17%2021%3A55%3A41 ~36% So it still looks like a performance-related issue to me. And yes, fsync = off -> on greatly increases (~3x) the overall test duration in that my environment. Best regards, Alexander
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hello Anton, 14.03.2024 23:36, Anton A. Melnikov wrote: On 13.03.2024 10:41, Anton A. Melnikov wrote: Here is a version updated for the current master. During patch updating i mistakenly added double counting of deallocatated blocks. That's why the tests in the patch tester failed. Fixed it and squashed fix 0002 with 0001. Here is fixed version. Please try the following with the patches applied: echo "shared_buffers = '1MB' max_total_backend_memory = '10MB'" > /tmp/extra.config CPPFLAGS="-Og" ./configure --enable-tap-tests --enable-debug --enable-cassert ... TEMP_CONFIG=/tmp/extra.config make check It fails for me as follows: ... # postmaster did not respond within 60 seconds, examine ".../src/test/regress/log/postmaster.log" for the reason ... src/test/regress/log/postmaster.log contains: ... TRAP: failed Assert("ret != NULL"), File: "mcxt.c", Line: 1327, PID: 4109270 TRAP: failed Assert("ret != NULL"), File: "mcxt.c", Line: 1327, PID: 4109271 postgres: autovacuum launcher (ExceptionalCondition+0x69)[0x55ce441fcc6e] postgres: autovacuum launcher (palloc0+0x0)[0x55ce4422eb67] postgres: logical replication launcher (ExceptionalCondition+0x69)[0x55ce441fcc6e] postgres: autovacuum launcher (InitDeadLockChecking+0xa6)[0x55ce4408a6f0] postgres: logical replication launcher (palloc0+0x0)[0x55ce4422eb67] postgres: logical replication launcher (InitDeadLockChecking+0x45)[0x55ce4408a68f] postgres: autovacuum launcher (InitProcess+0x600)[0x55ce4409c6f2] postgres: logical replication launcher (InitProcess+0x600)[0x55ce4409c6f2] postgres: autovacuum launcher (+0x44b4e2)[0x55ce43ff24e2] ... grep TRAP src/test/regress/log/postmaster.log | wc -l 445 Best regards, Alexander
Re: Recent 027_streaming_regress.pl hangs
Hello Thomas and Michael, 14.03.2024 06:16, Thomas Munro wrote: Yeah, I was wondering if its checkpoint delaying logic might have got the checkpointer jammed or something like that, but I don't currently see how. Yeah, the replay of bulk newpages could be relevant, but it's not exactly new technology. One thing I wondered about is whether the Perl "wait for catchup" thing, which generates large volumes of useless log, could be somehow changed to actually show the progress after some time. Something like "I'm still waiting for this replica to reach LSN X, but it has so far only reported LSN Y, and here's a dump of the WAL around there"? I have perhaps reproduced the issue here (at least I'm seeing something similar), and going to investigate the issue in the coming days, but what I'm confused with now is the duration of poll_query_until: For the failure you referenced: [15:55:54.740](418.725s) # poll_query_until timed out executing this query: And a couple of others: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2024-03-08%2000%3A34%3A06 [00:45:57.747](376.159s) # poll_query_until timed out executing this query: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2024-03-04%2016%3A32%3A17 [16:45:24.870](407.970s) # poll_query_until timed out executing this query: Could it be that the timeout (360 sec?) is just not enough for the test under the current (changed due to switch to meson) conditions? Best regards, Alexander
Re: pg16: XX000: could not find pathkey item to sort
Hello David, 09.10.2023 07:13, David Rowley wrote: On Mon, 9 Oct 2023 at 12:42, David Rowley wrote: Maybe it's worth checking the total planning time spent in a run of the regression tests with and without the patch to see how much overhead it adds to the "average case". I've now pushed the patch that trims off the Pathkeys for the ORDER BY / DISTINCT aggregates. I've stumbled upon the same error, but this time it apparently has another cause. It can be produced (on REL_16_STABLE and master) as follows: CREATE TABLE t (a int, b int) PARTITION BY RANGE (a); CREATE TABLE td PARTITION OF t DEFAULT; CREATE TABLE tp1 PARTITION OF t FOR VALUES FROM (1) TO (2); SET enable_partitionwise_aggregate = on; SET parallel_setup_cost = 0; SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a; ERROR: could not find pathkey item to sort `git bisect` for this anomaly blames the same commit 1349d2790. Best regards, Alexander
Re: Spurious pgstat_drop_replslot() call
Hello Bertrand and Michael, 12.03.2024 09:17, Bertrand Drouvot wrote: May I suggest the attached additions with LWLockHeldByMeInMode to make sure that the stats are dropped and created while we hold ReplicationSlotAllocationLock? Yeah, makes fully sense and looks good to me. Sorry for a bit off-topic, but I've remembered an anomaly with a similar assert: https://www.postgresql.org/message-id/17947-b9554521ad963c9c%40postgresql.org Maybe you would find it worth considering while working in this area... (I've just run that reproducer on b36fbd9f8 and confirmed that the assertion failure is still here.) Best regards, Alexander
Re: DROP DATABASE is interruptible
Hi, 13.07.2023 23:52, Andres Freund wrote: Backpatching indeed was no fun. Not having BackgroundPsql.pm was the worst part. But also a lot of other conflicts in tests... Took me 5-6 hours or so. But I now finally pushed the fixes. Hope the buildfarm agrees with it... Thanks for the review! I've discovered that the test 037_invalid_database, introduced with c66a7d75e, hangs when a server built with -DCLOBBER_CACHE_ALWAYS or with debug_discard_caches = 1 set via TEMP_CONFIG: echo "debug_discard_caches = 1" >/tmp/extra.config TEMP_CONFIG=/tmp/extra.config make -s check -C src/test/recovery/ PROVE_TESTS="t/037*" # +++ tap check in src/test/recovery +++ [09:05:48] t/037_invalid_database.pl .. 6/? regress_log_037_invalid_database ends with: [09:05:51.622](0.021s) # issuing query via background psql: # CREATE DATABASE regression_invalid_interrupt; # BEGIN; # LOCK pg_tablespace; # PREPARE TRANSACTION 'lock_tblspc'; [09:05:51.684](0.062s) ok 8 - blocked DROP DATABASE completion I see two backends waiting: law 2420132 2420108 0 09:05 ? 00:00:00 postgres: node: law postgres [local] DROP DATABASE waiting law 2420135 2420108 0 09:05 ? 00:00:00 postgres: node: law postgres [local] startup waiting and the latter's stack trace: #0 0x7f65c8fd3f9a in epoll_wait (epfd=9, events=0x563c40e15478, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 #1 0x563c3fa9a9fa in WaitEventSetWaitBlock (set=0x563c40e15410, cur_timeout=-1, occurred_events=0x7fff579dda80, nevents=1) at latch.c:1570 #2 0x563c3fa9a8e4 in WaitEventSetWait (set=0x563c40e15410, timeout=-1, occurred_events=0x7fff579dda80, nevents=1, wait_event_info=50331648) at latch.c:1516 #3 0x563c3fa99b14 in WaitLatch (latch=0x7f65c5e112e4, wakeEvents=33, timeout=0, wait_event_info=50331648) at latch.c:538 #4 0x563c3fac7dee in ProcSleep (locallock=0x563c40e41e80, lockMethodTable=0x563c4007cba0 ) at proc.c:1339 #5 0x563c3fab4160 in WaitOnLock (locallock=0x563c40e41e80, owner=0x563c40ea5af8) at lock.c:1816 #6 0x563c3fab2c80 in LockAcquireExtended (locktag=0x7fff579dde30, lockmode=1, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x7fff579dde28) at lock.c:1080 #7 0x563c3faaf86d in LockRelationOid (relid=1213, lockmode=1) at lmgr.c:116 #8 0x563c3f537aff in relation_open (relationId=1213, lockmode=1) at relation.c:55 #9 0x563c3f5efde9 in table_open (relationId=1213, lockmode=1) at table.c:44 #10 0x563c3fca2227 in CatalogCacheInitializeCache (cache=0x563c40e8fe80) at catcache.c:980 #11 0x563c3fca255e in InitCatCachePhase2 (cache=0x563c40e8fe80, touch_index=true) at catcache.c:1083 #12 0x563c3fcc0556 in InitCatalogCachePhase2 () at syscache.c:184 #13 0x563c3fcb7db3 in RelationCacheInitializePhase3 () at relcache.c:4317 #14 0x563c3fce2748 in InitPostgres (in_dbname=0x563c40e54000 "postgres", dboid=5, username=0x563c40e53fe8 "law", useroid=0, flags=1, out_dbname=0x0) at postinit.c:1177 #15 0x563c3fad90a7 in PostgresMain (dbname=0x563c40e54000 "postgres", username=0x563c40e53fe8 "law") at postgres.c:4229 #16 0x563c3f9f01e4 in BackendRun (port=0x563c40e45360) at postmaster.c:4475 It looks like no new backend can be started due to the pg_tablespace lock, when a new relcache file is needed during the backend initialization. Best regards, Alexander
Re: postgres_fdw test timeouts
Hello Nathan, 08.03.2024 01:00, Nathan Bossart wrote: On Sun, Dec 10, 2023 at 12:00:01PM +0300, Alexander Lakhin wrote: So I would not say that it's a dominant failure for now, and given that 04a09ee lives in master only, maybe we can save two commits (Revert + Revert of revert) while moving to a more persistent solution. I just checked in on this one to see whether we needed to create an "open item" for v17. While I'm not seeing the failures anymore, the patch that Alexander claimed should fix it [0] doesn't appear to have been committed, either. Perhaps this was fixed by something else... [0] https://postgr.es/m/CA%2BhUKGL0bikWSC2XW-zUgFWNVEpD_gEWXndi2PE5tWqmApkpZQ%40mail.gmail.com I have re-run the tests and found out that the issue was fixed by d3c5f37dd. It changed the inner of the loop "while (PQisBusy(conn))", formerly contained in pgfdw_get_result() as follows: /* Data available in socket? */ if (wc & WL_SOCKET_READABLE) { if (!PQconsumeInput(conn)) pgfdw_report_error(ERROR, NULL, conn, false, query); } -> /* Consume whatever data is available from the socket */ if (PQconsumeInput(conn) == 0) { /* trouble; expect PQgetResult() to return NULL */ break; } That is, the unconditional "if PQconsumeInput() ..." eliminates the test timeout. Best regards, Alexander
Re: partitioning and identity column
20.02.2024 07:57, Ashutosh Bapat wrote: Could you please name functions, which you suspect, for me to recheck them? Perhaps we should consider fixing all of such functions, in light of b0f7dd915 and d57b7cc33... Looks like the second commit has fixed all other places I knew except Identity related functions. So worth fixing identity related functions too. I see dropconstraint_internal() has two calls to check_stack_depth() back to back. The second one is not needed? Yeah, that's funny. It looks like such a double protection emerged because Alvaro protected the function (in b0f7dd915), which was waiting for adding check_stack_depth() in the other thread (resulted in d57b7cc33). Thank you for spending time on this! Best regards, Alexander
Re: partitioning and identity column
Hello Ashutosh, 19.02.2024 15:17, Ashutosh Bapat wrote: Functions ATExecAddIdentity() and ATExecDropIdentity() are recursive too, so I think they can be exploited as well. not just Identity related functions, but many other functions in tablecmds.c have that problem as I mentioned earlier. Could you please name functions, which you suspect, for me to recheck them? Perhaps we should consider fixing all of such functions, in light of b0f7dd915 and d57b7cc33... Best regards, Alexander
Re: Removing unneeded self joins
18.02.2024 19:18, Alexander Korotkov wrote: Attached is a draft patch fixing this query. Could you, please, recheck? Yes, this patch fixes the behavior for that query (I've also tried several similar queries). Though I don't know the code well enough to judge the code change. Best regards, Alexander