Re: ResourceOwner refactoring

2024-06-01 Thread Alexander Lakhin

Hello Heikki,

I've stumbled upon yet another anomaly introduced with b8bff07da.

Please try the following script:
SELECT 'init' FROM pg_create_logical_replication_slot('slot',
  'test_decoding');
CREATE TABLE tbl(t text);
BEGIN;
TRUNCATE table tbl;

SELECT data FROM pg_logical_slot_get_changes('slot', NULL, NULL,
 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');

On b8bff07da (and the current HEAD), it ends up with:
ERROR:  ResourceOwnerEnlarge called after release started

But on the previous commit, I get no error:
 data
--
(0 rows)

Best regards,
Alexander




Re: To what extent should tests rely on VACUUM ANALYZE?

2024-05-31 Thread Alexander Lakhin

29.03.2024 11:59, Alexander Lakhin wrote:

But it looks like subselect is not the only test that can fail due to
vacuum instability. I see that create_index also suffers from cranky
ConditionalLockBufferForCleanup() (+if (rand() % 10 == 0)  ...


Just for the record, I think I've reproduced the same failure as:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2024-03-17%2003%3A03%3A57
not ok 66    + create_index    27509 ms
...

and the similar occurrences:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2024-01-02%2007%3A09%3A09
not ok 66    + create_index    25830 ms

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2023-11-15%2006%3A16%3A15
not ok 66    + create_index    38508 ms

by running 8 027_stream_regress instances in parallel on a slow ARMv7
device like this:
for i in `seq 10`; do echo "I $i"; parallel --halt now,fail=1  -j8 \
 --linebuffer --tag NO_TEMP_INSTALL=1 make -s check -C \
 src/test/recovery_{}/ PROVE_TESTS="t/027*" ::: `seq 8` || break; done
5
5   #   Failed test 'regression tests pass'
5   #   at t/027_stream_regress.pl line 92.
5   #  got: '256'
5   # expected: '0'
5   t/027_stream_regress.pl ..
5   Dubious, test returned 1 (wstat 256, 0x100)
5   Failed 1/6 subtests

not ok 66    + create_index   152995 ms
...
=== dumping .../src/test/recovery_5/tmp_check/regression.diffs ===
diff -U3 .../src/test/regress/expected/create_index.out 
.../src/test/recovery_5/tmp_check/results/create_index.out
--- .../src/test/regress/expected/create_index.out  2024-05-30 
15:30:34.523048633 +
+++ .../src/test/recovery_5/tmp_check/results/create_index.out 2024-05-31 
13:07:56.359001362 +
@@ -1916,11 +1916,15 @@
 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)
 ORDER BY unique1;
-  QUERY PLAN

- Index Only Scan using tenk1_unique1 on tenk1
-   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+    QUERY PLAN
+---
+ Sort
+   Sort Key: unique1
+   ->  Bitmap Heap Scan on tenk1
+ Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+ ->  Bitmap Index Scan on tenk1_unique1
+   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+(6 rows)

 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)
@@ -1936,12 +1940,13 @@
 SELECT thousand, tenthous FROM tenk1
 WHERE thousand < 2 AND tenthous IN (1001,3000)
 ORDER BY thousand;
-  QUERY PLAN

- Index Only Scan using tenk1_thous_tenthous on tenk1
-   Index Cond: (thousand < 2)
-   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+  QUERY PLAN
+--
+ Sort
+   Sort Key: thousand
+   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+ Index Cond: ((thousand < 2) AND (tenthous = ANY 
('{1001,3000}'::integer[])))
+(4 rows)

 SELECT thousand, tenthous FROM tenk1
 WHERE thousand < 2 AND tenthous IN (1001,3000)
=== EOF ===

I got failures on iteration 2, 3, 7, 1.

But with the repeated VACUUM ANALYZE:
--- a/src/test/regress/sql/test_setup.sql
+++ b/src/test/regress/sql/test_setup.sql
@@ -163,6 +163,8 @@ CREATE TABLE tenk1 (
 \set filename :abs_srcdir '/data/tenk.data'
 COPY tenk1 FROM :'filename';
 VACUUM ANALYZE tenk1;
+VACUUM ANALYZE tenk1;
+VACUUM ANALYZE tenk1;

20 iterations succeeded in the same environment.

So I think that that IOS plan change can be explained by the issue
discussed here.

Best regards,
Alexander




The xversion-upgrade test fails to stop server

2024-05-30 Thread Alexander Lakhin

Hello Andrew,

While reviewing recent buildfarm failures, I came across this one:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake=2024-05-23%2004%3A11%3A03

upgrade.crake/REL_16_STABLE/REL9_5_STABLE-ctl4.log
waiting for server to shut 
down... 
failed

pg_ctl: server does not shut down

Looking at:
https://github.com/PGBuildFarm/client-code/blob/05014d50e/PGBuild/Modules/TestUpgradeXversion.pm#L641

I see that ctl4.log is created after updating extensions and
REL9_5_STABLE-update_extensions.log contains:
You are now connected to database "contrib_regression_redis_fdw" as user 
"buildfarm".
ALTER EXTENSION "hstore" UPDATE;
ALTER EXTENSION
You are now connected to database "contrib_regression_btree_gin" as user 
"buildfarm".
ALTER EXTENSION "btree_gin" UPDATE;
ALTER EXTENSION
...
but I see no corresponding server log file containing these commands in the
failure log.

When running the same test locally, I find these in inst/upgrade_log.

Maybe uploading this log file too would help to understand what is the
cause of the failure...

Best regards,
Alexander




001_rep_changes.pl fails due to publisher stuck on shutdown

2024-05-29 Thread Alexander Lakhin

Hello hackers,

As a recent buildfarm test failure [1] shows:
[14:33:02.374](0.333s) ok 23 - update works with dropped subscriber column
### Stopping node "publisher" using mode fast
# Running: pg_ctl -D 
/home/bf/bf-build/adder/HEAD/pgsql.build/testrun/subscription/001_rep_changes/data/t_001_rep_changes_publisher_data/pgdata 
-m fast stop

waiting for server to shut down.. ... ... ... .. failed
pg_ctl: server does not shut down
# pg_ctl stop failed: 256
# Postmaster PID for node "publisher" is 549
[14:39:04.375](362.001s) Bail out!  pg_ctl stop failed

001_rep_changes_publisher.log
2024-05-16 14:33:02.907 UTC [2238704][client backend][4/22:0] LOG: statement: 
DELETE FROM tab_rep
2024-05-16 14:33:02.925 UTC [2238704][client backend][:0] LOG: disconnection: session time: 0:00:00.078 user=bf 
database=postgres host=[local]

2024-05-16 14:33:02.939 UTC [549][postmaster][:0] LOG:  received fast 
shutdown request
2024-05-16 14:33:03.000 UTC [549][postmaster][:0] LOG:  aborting any active 
transactions
2024-05-16 14:33:03.049 UTC [549][postmaster][:0] LOG: background worker "logical replication launcher" (PID 
2223110) exited with exit code 1

2024-05-16 14:33:03.062 UTC [901][checkpointer][:0] LOG: shutting down
2024-05-16 14:39:04.377 UTC [549][postmaster][:0] LOG:  received immediate 
shutdown request
2024-05-16 14:39:04.382 UTC [549][postmaster][:0] LOG:  database system is 
shut down

the publisher node may hang on stopping.

I reproduced the failure (with aggressive autovacuum) locally and
discovered that it happens because:
1) checkpointer calls WalSndInitStopping() (which sends
 PROCSIG_WALSND_INIT_STOPPING to walsender), and then spins inside
 WalSndWaitStopping() indefinitely, because:
2) walsender receives the signal, sets got_STOPPING = true, but can't exit
WalSndLoop():
3) it never sets got_SIGUSR2 (to get into WalSndDone()) in
 XLogSendLogical():
4) it never sets WalSndCaughtUp to true:
5) logical_decoding_ctx->reader->EndRecPtr can't reach flushPtr in
 XLogSendLogical():
6) EndRecPtr doesn't advance in XLogNextRecord():
7) XLogDecodeNextRecord() fails do decode a record that crosses a page
 boundary:
8) ReadPageInternal() (commented "Wait for the next page to become
 available") constantly returns XLREAD_FAIL:
9) state->routine.page_read()/logical_read_xlog_page() constantly returns
 -1:
10) flushptr = WalSndWaitForWal() stops advancing, because
 got_STOPPING == true (see 2).

That is, walsender doesn't let itself to catch up, if it gets the stop
signal when it's lagging behind and decoding a record requires reading
the next wal page.

Please look at the reproducing test (based on 001_rep_changes.pl) attached.
If fails for me as below:
# 17
Bailout called.  Further testing stopped:  pg_ctl stop failed
FAILED--Further testing stopped: pg_ctl stop failed
make: *** [Makefile:21: check] Ошибка 255

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-05-16%2014%3A22%3A38
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop=2024-04-24%2014%3A38%3A35
 (apparently the same)

Best regards,
Alexander

099_walsender_stop.pl
Description: Perl program


Test 031_recovery_conflict fails when a conflict counted twice

2024-05-27 Thread Alexander Lakhin

Hello hackers,

As a recent buildfarm test failure on olingo (which builds postgres with
-O1 and address sanitizer) [1] shows:
...
[23:12:02.127](0.166s) not ok 6 - snapshot conflict: stats show conflict on 
standby
[23:12:02.130](0.003s) #   Failed test 'snapshot conflict: stats show conflict 
on standby'
#   at 
/home/bf/bf-build/olingo/HEAD/pgsql/src/test/recovery/t/031_recovery_conflict.pl
 line 332.
[23:12:02.130](0.000s) #  got: '2'
# expected: '1'
...
[23:12:06.848](1.291s) not ok 17 - 5 recovery conflicts shown in 
pg_stat_database
[23:12:06.887](0.040s) #   Failed test '5 recovery conflicts shown in 
pg_stat_database'
#   at 
/home/bf/bf-build/olingo/HEAD/pgsql/src/test/recovery/t/031_recovery_conflict.pl
 line 286.
[23:12:06.887](0.000s) #  got: '6'
# expected: '5'
Waiting for replication conn standby's replay_lsn to pass 0/3459160 on primary
done
...

pgsql.build/testrun/recovery/031_recovery_conflict/log/031_recovery_conflict_standby.log:
2024-05-15 23:12:01.616 UTC [1299981][client backend][2/2:0] LOG: statement: FETCH FORWARD FROM 
test_recovery_conflict_cursor;

2024-05-15 23:12:01.617 UTC [1299981][client backend][2/2:0] LOG: statement: ;
2024-05-15 23:12:01.910 UTC [1297595][startup][34/0:0] LOG: recovery still waiting after 15.289 ms: recovery conflict on 
snapshot

2024-05-15 23:12:01.910 UTC [1297595][startup][34/0:0] DETAIL: Conflicting 
process: 1299981.
2024-05-15 23:12:01.910 UTC [1297595][startup][34/0:0] CONTEXT:  WAL redo at 0/344F468 for Heap2/PRUNE_VACUUM_SCAN: 
snapshotConflictHorizon: 746, isCatalogRel: F, nplans: 2, nredirected: 18, ndead: 0, nunused: 0, plans: [{ xmax: 0, 
infomask: 2816, infomask2: 2, ntuples: 2, offsets: [21, 22] }, { xmax: 0, infomask: 11008, infomask2: 32770, ntuples: 
18, offsets: [41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58] }], redirected: [23->41, 24->42, 
25->43, 26->44, 27->45, 28->46, 29->47, 30->48, 31->49, 32->50, 33->51, 34->52, 35->53, 36->54, 37->55, 38->56, 39->57, 
40->58]; blkref #0: rel 1663/16385/16386, blk 0

2024-05-15 23:12:01.959 UTC [1299981][client backend][2/2:0] FATAL: terminating 
connection due to conflict with recovery
2024-05-15 23:12:01.959 UTC [1299981][client backend][2/2:0] DETAIL:  User query might have needed to see row versions 
that must be removed.
2024-05-15 23:12:01.959 UTC [1299981][client backend][2/2:0] HINT: In a moment you should be able to reconnect to the 
database and repeat your command.

vvv an important difference from a successful test run
2024-05-15 23:12:01.966 UTC [1299981][client backend][2/2:0] LOG: could not 
send data to client: Broken pipe
2024-05-15 23:12:01.966 UTC [1299981][client backend][2/2:0] FATAL: terminating 
connection due to conflict with recovery
2024-05-15 23:12:01.966 UTC [1299981][client backend][2/2:0] DETAIL:  User query might have needed to see row versions 
that must be removed.
2024-05-15 23:12:01.966 UTC [1299981][client backend][2/2:0] HINT: In a moment you should be able to reconnect to the 
database and repeat your command.

^^^

test  031_recovery_conflict may fail in the following scenario:

031_recovery_conflict.pl:
    executes  a query, which produces a conflict:
    ## RECOVERY CONFLICT 2: Snapshot conflict
    ...
    $psql_standby->query_safe(...)

startup process:
    detects a snapshot conflict and sends
    PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
    (ResolveRecoveryConflictWithVirtualXIDs ->
    CancelVirtualTransaction) to the client backend

client backend:
    receives and processes the signal:
    HandleRecoveryConflictInterrupt; ProcessClientReadInterrupt ->
    ProcessInterrupts -> ProcessRecoveryConflictInterrupts ->
    ProcessRecoveryConflictInterrupt,

    reports the recovery conflict:
    pgstat_report_recovery_conflict(reason);

    and reports the error:
    ereport(FATAL, ... "terminating connection due to conflict with
    recovery" ...)
    sends the message to the server log:
    errfinish -> EmitErrorReport -> send_message_to_server_log

031_recovery_conflict.pl:
    # finds the message in the log and resets psql connection:
    check_conflict_log(
  "User query might have needed to see row versions that must
   be removed");
    $psql_standby->reconnect_and_clear();

startup process:
    keeps sending PROCSIG_RECOVERY_CONFLICT_SNAPSHOT to the client
    backend in a loop inside ResolveRecoveryConflictWithVirtualXIDs

client backend:
    tries to send the message to the client:
    send_message_to_frontend -> socket_flush ->
    internal_flush_buffer,
    gets an error (EPIPE) from secure_write, and calls
    ereport(COMMERROR,
  (errcode_for_socket_access(),
   errmsg("could not send data to client: %m")));

   receives the following PROCSIG_RECOVERY_CONFLICT_SNAPSHOT
   signal and processes it 

Re: Improving tracking/processing of buildfarm test failures

2024-05-25 Thread Alexander Lakhin

Hello Amit and Noah,

24.05.2024 14:15, Amit Kapila wrote:

I feel it is a good idea to do something about this. It makes sense to
start with something simple and see how it works. I think this can
also help us whether we need to chase a particular BF failure
immediately after committing.


24.05.2024 23:00, Noah Misch wrote:




(I could start with the second approach, if you don't mind, and we'll see
how it works.)

Certainly you doing (2) can only help, though it may help less than (1).


Thank you for paying attention to this!

I've created such page to accumulate information on test failures:
https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures

I've deliberately added a trivial issue with partition_split, which is
doomed to be fixed soon, to test the information workflow, and I'm going
to add a few other items in the coming days.

Please share your comments and suggestions, if any.

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-24 Thread Alexander Lakhin

Hello,

24.05.2024 22:29, Tom Lane wrote:

The partition_split test has unstable results, as shown at [1].
I suggest adding "ORDER BY conname" to the two queries shown
to fail there.  Better look at other queries in the test for
possible similar problems, too.


Yes, I've just reproduced it on an aarch64 device as follows:
echo "autovacuum_naptime = 1
autovacuum_vacuum_threshold = 1
autovacuum_analyze_threshold = 1
" > ~/temp.config
TEMP_CONFIG=~/temp.config TESTS="$(printf 'partition_split %.0s' `seq 100`)" 
make -s check-tests
...
ok 80    - partition_split   749 ms
not ok 81    - partition_split   728 ms
ok 82    - partition_split   732 ms

$ cat src/test/regress/regression.diffs
diff -U3 .../src/test/regress/expected/partition_split.out 
.../src/test/regress/results/partition_split.out
--- .../src/test/regress/expected/partition_split.out   2024-05-15 
17:15:57.171999830 +
+++ .../src/test/regress/results/partition_split.out    2024-05-24 
19:28:37.32749 +
@@ -625,8 +625,8 @@
 SELECT pg_get_constraintdef(oid), conname, conkey FROM pg_constraint WHERE conrelid = 
'sales_feb_mar_apr2022'::regclass::oid;

pg_get_constraintdef | conname | conkey
 
-+-+
- CHECK ((sales_amount > 1))  | 
sales_range_sales_amount_check  | {2}
  FOREIGN KEY (salesperson_id) REFERENCES salespeople(salesperson_id) | 
sales_range_salesperson_id_fkey | {1}
+ CHECK ((sales_amount > 1))  | 
sales_range_sales_amount_check  | {2}
 (2 rows)

 ALTER TABLE sales_range SPLIT PARTITION sales_feb_mar_apr2022 INTO

Best regards,
Alexander




Improving tracking/processing of buildfarm test failures

2024-05-23 Thread Alexander Lakhin

Hello hackers,

I'd like to discuss ways to improve the buildfarm experience for anyone who
are interested in using information which buildfarm gives to us.

Unless I'm missing something, as of now there are no means to determine
whether some concrete failure is known/investigated or fixed, how
frequently it occurs and so on... From my experience, it's not that
unbelievable that some failure occurred two years ago and lost in time was
an indication of e. g. a race condition still existing in the code/tests
and thus worth fixing. But without classifying/marking failures it's hard
to find such or other interesting failure among many others...

The first way to improve things I can imagine is to add two fields to the
buildfarm database: a link to the failure discussion (set when the failure
is investigated/reproduced and reported in -bugs or -hackers) and a commit
id/link (set when the failure is fixed). I understand that it requires
modifying the buildfarm code, and adding some UI to update these fields,
but it allows to add filters to see only unknown/non-investigated failures
in the buildfarm web interface later.

The second way is to create a wiki page, similar to "PostgreSQL 17 Open
Items", say, "Known buildfarm test failures" and fill it like below:


...
Useful info from the failure logs for reference
...

---
This way is less invasive, but it would work well only if most of
interested people know of it/use it.
(I could start with the second approach, if you don't mind, and we'll see
how it works.)

Best regards,
Alexande|r|

Re: Testing autovacuum wraparound (including failsafe)

2024-05-22 Thread Alexander Lakhin

Hello,

30.11.2023 10:35, Masahiko Sawada wrote:



I've attached new version patches (0002 and 0003 are unchanged except
for the commit message). I'll push them, barring any objections.


Pushed.


I've discovered that the test 001_emergency_vacuum.pl can fail due to a
race condition. I can't see the server log at [1], but I reproduced the
failure locally and with additional logging and log_min_messages = DEBUG3,
the log shows:
...
2024-05-22 11:46:28.125 UTC [21256:2853] DEBUG:  SlruScanDirectory invoking 
callback on pg_xact/0690
2024-05-22 11:46:28.125 UTC [21256:2854] DEBUG:  transaction ID wrap limit is 
2147484396, limited by database with OID 5
2024-05-22 11:46:28.126 UTC [21256:2855] LOG: !!!SendPostmasterSignal| 
PMSIGNAL_START_AUTOVAC_LAUNCHER
2024-05-22 11:46:28.135 UTC [14871:20077] DEBUG:  postmaster received pmsignal 
signal
2024-05-22 11:46:28.137 UTC [21257:1] DEBUG:  autovacuum launcher started
2024-05-22 11:46:28.137 UTC [21257:2] DEBUG:  InitPostgres
2024-05-22 11:46:28.138 UTC [21257:3] LOG:  !!!AutoVacLauncherMain| !AutoVacuumingActive() && !ShutdownRequestPending; 
before do_start_worker()

2024-05-22 11:46:28.138 UTC [21257:4] LOG:  !!!do_start_worker| return quickly 
when there are no free workers
2024-05-22 11:46:28.138 UTC [21257:5] DEBUG:  shmem_exit(0): 4 
before_shmem_exit callbacks to make
2024-05-22 11:46:28.138 UTC [21257:6] DEBUG:  shmem_exit(0): 6 on_shmem_exit 
callbacks to make
2024-05-22 11:46:28.138 UTC [21257:7] DEBUG:  proc_exit(0): 1 callbacks to make
2024-05-22 11:46:28.138 UTC [21257:8] DEBUG:  exit(0)
2024-05-22 11:46:28.138 UTC [21257:9] DEBUG:  shmem_exit(-1): 0 
before_shmem_exit callbacks to make
2024-05-22 11:46:28.138 UTC [21257:10] DEBUG:  shmem_exit(-1): 0 on_shmem_exit 
callbacks to make
2024-05-22 11:46:28.138 UTC [21257:11] DEBUG:  proc_exit(-1): 0 callbacks to 
make
2024-05-22 11:46:28.146 UTC [21256:2856] DEBUG:  MultiXactId wrap limit is 
2147483648, limited by database with OID 5
2024-05-22 11:46:28.146 UTC [21256:2857] DEBUG:  MultiXact member stop limit is 
now 4294914944 based on MultiXact 1
2024-05-22 11:46:28.146 UTC [21256:2858] DEBUG:  shmem_exit(0): 4 
before_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2859] DEBUG:  shmem_exit(0): 7 on_shmem_exit 
callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2860] DEBUG:  proc_exit(0): 1 callbacks to 
make
2024-05-22 11:46:28.147 UTC [21256:2861] DEBUG:  exit(0)
2024-05-22 11:46:28.147 UTC [21256:2862] DEBUG:  shmem_exit(-1): 0 
before_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2863] DEBUG:  shmem_exit(-1): 0 
on_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2864] DEBUG:  proc_exit(-1): 0 callbacks to 
make
2024-05-22 11:46:28.151 UTC [14871:20078] DEBUG:  forked new backend, pid=21258 
socket=8
2024-05-22 11:46:28.171 UTC [14871:20079] DEBUG:  server process (PID 21256) 
exited with exit code 0
2024-05-22 11:46:28.152 UTC [21258:1] [unknown] LOG:  connection received: 
host=[local]
2024-05-22 11:46:28.171 UTC [21258:2] [unknown] DEBUG:  InitPostgres
2024-05-22 11:46:28.172 UTC [21258:3] [unknown] LOG:  connection authenticated: user="vagrant" method=trust 
(/pgtest/postgresql.git/src/test/modules/xid_wraparound/tmp_check/t_001_emergency_vacuum_main_data/pgdata/pg_hba.conf:117)
2024-05-22 11:46:28.172 UTC [21258:4] [unknown] LOG:  connection authorized: user=vagrant database=postgres 
application_name=001_emergency_vacuum.pl

2024-05-22 11:46:28.175 UTC [21258:5] 001_emergency_vacuum.pl LOG: statement: 
INSERT INTO small(data) SELECT 1

That is, autovacuum worker (21256) sent PMSIGNAL_START_AUTOVAC_LAUNCHER,
postmaster started autovacuum launcher, which could not start new
autovacuum worker due to the process 21256 not exited yet.

The failure can be reproduced easily with the sleep added inside
SetTransactionIdLimit():
    if (TransactionIdFollowsOrEquals(curXid, xidVacLimit) &&
    IsUnderPostmaster && !InRecovery)
SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
+pg_usleep(1L);

By the way I also discovered that rather resource-intensive xid_wraparound
tests executed twice during the buildfarm "make" run (on dodo and perentie
(see [2]) animals), at stage module-xid_wraparound-check and then at stage
testmodules-install-check-C.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo=2024-05-19%2006%3A33%3A34
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=perentie=2024-05-22%2000%3A02%3A19

Best regards,
Alexander




Re: Cleaning up perl code

2024-05-20 Thread Alexander Lakhin

Hello Dagfinn,

Thank you for paying attention to it and improving the possible fix!

20.05.2024 23:39, Dagfinn Ilmari Mannsåker wrote:

Nice cleanup!  Did you use some static analysis tool, or did look for
them manually?


I reviewed my collection of unica I gathered for several months, but had
found some of them too minor/requiring more analysis.
Then I added more with perlcritic's policy UnusedVariables, and also
checked for unused subs with a script from blogs.perl.org (and it confirmed
my only previous find of that kind).


  If I add [Variables::ProhibitUnusedVariables] to
src/tools/perlcheck/perlcriticrc, it finds a few more, see the attached
patch.


Yes, I saw unused $sqlstates, but decided that they are meaningful enough
to stay. Though maybe enabling ProhibitUnusedVariables justifies fixing
them too.


The scripts parsing errcodes.txt really should be refactored into using
a common module, but that's a patch for another day.


Agree, and I would leave 005_negotiate_encryption.pl (with $node_conf,
$server_config unused since d39a49c1e) aside for another day too.

Best regards,
Alexander




Cleaning up perl code

2024-05-20 Thread Alexander Lakhin

Hello hackers,

Please look at a bunch of unused variables and a couple of other defects
I found in the perl code, maybe you'll find them worth fixing:
contrib/amcheck/t/001_verify_heapam.pl
$result # unused since introduction in 866e24d47
unused sub:
get_toast_for # not used since 860593ec3

contrib/amcheck/t/002_cic.pl
$result # orphaned since 7f580aa5d

src/backend/utils/activity/generate-wait_event_types.pl
$note, $note_name # unused since introduction in fa8892847

src/bin/pg_dump/t/003_pg_dump_with_server.pl
$cmd, $stdout, $stderr, $result # unused since introduction in 2f9eb3132

src/bin/pg_dump/t/005_pg_dump_filterfile.pl
$cmd, $stdout, $stderr, $result # unused since introduciton in a5cf808be

src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl
$slapd, $ldap_schema_dir # unused since introduction in 419a8dd81

src/test/modules/ssl_passphrase_callback/t/001_testfunc.pl
$clearpass # orphaned since b846091fd

src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
$ostmt # unused since introduction in 47bb9db75

src/test/recovery/t/021_row_visibility.pl
$ret # unused since introduction in 7b28913bc

src/test/recovery/t/032_relfilenode_reuse.pl
$ret # unused since introduction in e2f65f425

src/test/recovery/t/035_standby_logical_decoding.pl
$stdin, $ret, $slot # unused since introduction in fcd77d532
$subscriber_stdin, $subscriber_stdout, $subscriber_stderr # unused since 
introduction in 376dc8205

src/test/subscription/t/024_add_drop_pub.pl
invalid reference in a comment:
tab_drop_refresh -> tab_2 # introduced with 1046a69b3
(invalid since v6-0001-...patch in the commit's thread)

src/tools/msvc_gendef.pl
@def # unused since introduction in 933b46644?

I've attached a patch with all of these changes (tested with meson build
on Windows and check-world on Linux).

Best regards,
Alexanderdiff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
index 9de3148277..481e4dbe4f 100644
--- a/contrib/amcheck/t/001_verify_heapam.pl
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -9,7 +9,7 @@ use PostgreSQL::Test::Utils;
 
 use Test::More;
 
-my ($node, $result);
+my $node;
 
 #
 # Test set-up
@@ -87,19 +87,6 @@ sub relation_filepath
 	return "$pgdata/$rel";
 }
 
-# Returns the fully qualified name of the toast table for the named relation
-sub get_toast_for
-{
-	my ($relname) = @_;
-
-	return $node->safe_psql(
-		'postgres', qq(
-		SELECT 'pg_toast.' || t.relname
-			FROM pg_catalog.pg_class c, pg_catalog.pg_class t
-			WHERE c.relname = '$relname'
-			  AND c.reltoastrelid = t.oid));
-}
-
 # (Re)create and populate a test table of the given name.
 sub fresh_test_table
 {
diff --git a/contrib/amcheck/t/002_cic.pl b/contrib/amcheck/t/002_cic.pl
index 53a3db9745..0207407ca0 100644
--- a/contrib/amcheck/t/002_cic.pl
+++ b/contrib/amcheck/t/002_cic.pl
@@ -10,7 +10,7 @@ use PostgreSQL::Test::Utils;
 
 use Test::More;
 
-my ($node, $result);
+my $node;
 
 #
 # Test set-up
diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index 42f36f405b..b2d61bd8ba 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -42,8 +42,6 @@ my @abi_compatibility_lines;
 my @lines;
 my $abi_compatibility = 0;
 my $section_name;
-my $note;
-my $note_name;
 
 # Remove comments and empty lines and add waitclassname based on the section
 while (<$wait_event_names>)
diff --git a/src/bin/pg_dump/t/003_pg_dump_with_server.pl b/src/bin/pg_dump/t/003_pg_dump_with_server.pl
index b5a1445550..3f549f44e7 100644
--- a/src/bin/pg_dump/t/003_pg_dump_with_server.pl
+++ b/src/bin/pg_dump/t/003_pg_dump_with_server.pl
@@ -26,7 +26,6 @@ $node->safe_psql('postgres', "CREATE SERVER s1 FOREIGN DATA WRAPPER dummy");
 $node->safe_psql('postgres', "CREATE SERVER s2 FOREIGN DATA WRAPPER dummy");
 $node->safe_psql('postgres', "CREATE FOREIGN TABLE t0 (a int) SERVER s0");
 $node->safe_psql('postgres', "CREATE FOREIGN TABLE t1 (a int) SERVER s1");
-my ($cmd, $stdout, $stderr, $result);
 
 command_fails_like(
 	[ "pg_dump", '-p', $port, '--include-foreign-data=s0', 'postgres' ],
diff --git a/src/bin/pg_dump/t/005_pg_dump_filterfile.pl b/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
index a80e13a0d3..6025bb296c 100644
--- a/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
+++ b/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
@@ -81,7 +81,6 @@ $node->safe_psql('sourcedb',
 #
 # Test interaction of correctly specified filter file
 #
-my ($cmd, $stdout, $stderr, $result);
 
 # Empty filterfile
 open $inputfile, '>', "$tempdir/inputfile.txt"
diff --git a/src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl b/src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl
index b990e7d101..82b1cb88e9 100644
--- a/src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl
+++ b/src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl
@@ -12,10 +12,6 @@ use Test::More;
 

Re: Avoid orphaned objects dependencies, take 3

2024-05-19 Thread Alexander Lakhin

Hello Bertrand,

15.05.2024 11:31, Bertrand Drouvot wrote:

On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote:


+if (object_description)
+ereport(ERROR, errmsg("%s does not exist", 
object_description));
+else
+ereport(ERROR, errmsg("a dependent object does not ex

This generates an internal error code.  Is that intended?

Yes, it's like when say you want to create an object in a schema that does not
exist (see get_namespace_oid()).


AFAICS, get_namespace_oid() throws not ERRCODE_INTERNAL_ERROR,
but ERRCODE_UNDEFINED_SCHEMA:

# SELECT regtype('unknown_schema.type');
ERROR:  schema "unknown_schema" does not exist
LINE 1: SELECT regtype('unknown_schema.type');
   ^
# \echo :LAST_ERROR_SQLSTATE
3F000

Probably, it's worth to avoid ERRCODE_INTERNAL_ERROR here in light of [1]
and [2], as these errors are not that abnormal (not Assert-like).

[1] https://www.postgresql.org/message-id/Zic_GNgos5sMxKoa%40paquier.xyz
[2] https://commitfest.postgresql.org/48/4735/

Best regards,
Alexander




Re: Underscore in positional parameters?

2024-05-18 Thread Alexander Lakhin

Hello Erik,

18.05.2024 04:31, Erik Wienhold wrote:

On 2024-05-17 02:06 +0200, Michael Paquier wrote:

On Thu, May 16, 2024 at 08:41:11AM +0200, Peter Eisentraut wrote:

On this specific patch, maybe reword "parameter too large" to "parameter
number too large".

WFM here.

Done in v3.


Thank you for working on this!

I encountered anomalies that you address with this patch too.
And I can confirm that it fixes most cases, but there is another one:
SELECT $3 \bind 'foo' \g
ERROR:  invalid memory alloc request size 12

Maybe you would find this worth fixing as well.

Best regards,
Alexander




Re: Streaming read-ready sequential scan code

2024-05-18 Thread Alexander Lakhin

Hello Thomas,

18.05.2024 07:47, Thomas Munro wrote:

After more debugging, we learned a lot more things...

1.  That query produces spectacularly bad estimates, so we finish up
having to increase the number of buckets in a parallel hash join many
times.  That is quite interesting, but unrelated to new code.
2.  Parallel hash join is quite slow at negotiating an increase in the
number of hash bucket, if all of the input tuples are being filtered
out by quals, because of the choice of where workers check for
PHJ_GROWTH_NEED_MORE_BUCKETS.  That could be improved quite easily I
think.  I have put that on my todo list 'cause that's also my code,
but it's not a new issue it's just one that is now highlighted...
3.  This bit of read_stream.c is exacerbating unfairness in the
underlying scan, so that 1 and 2 come together and produce a nasty
slowdown, which goes away if you change it like so:

-   BlockNumber blocknums[16];
+   BlockNumber blocknums[1];

I will follow up after some more study.


Thank you for the information!

Unfortunately, I can't see significant differences in my environment with
parallel_leader_participation=off.

With blocknums[1], timing is changed, but the effect is not persistent.
10 query15 executions in a row, b7b0f3f27:
277.932 ms
281.805 ms
278.335 ms
281.565 ms
284.167 ms
283.171 ms
281.165 ms
281.615 ms
285.394 ms
277.301 ms

b7b0f3f27~1:
159.789 ms
165.407 ms
160.893 ms
159.343 ms
160.936 ms
161.577 ms
161.637 ms
163.421 ms
163.143 ms
167.109 ms

b7b0f3f27 + blocknums[1]:
164.133 ms
280.920 ms
160.748 ms
163.182 ms
161.709 ms
161.998 ms
161.239 ms
276.256 ms
161.601 ms
160.384 ms

I placed PGDATA on tmpfs to rule out any blockdev specifics (increasing
blockdev ra from 256 to 4096 didn't help me with PGDATA on NVME either.)

Best regards,
Alexander




Re: Streaming read-ready sequential scan code

2024-05-17 Thread Alexander Lakhin

Hello,

I decided to compare v17 vs v16 performance (as I did the last year [1])
and discovered that v17 loses to v16 in the pg_tpcds (s64da_tpcds)
benchmark, query15 (and several others, but I focused on this one):
Best pg-src-master--.* worse than pg-src-16--.* by 52.2 percents (229.84 > 
151.03): pg_tpcds.query15
Average pg-src-master--.* worse than pg-src-16--.* by 53.4 percents (234.20 > 
152.64): pg_tpcds.query15
Please look at the full html report attached in case you're interested.

(I used my pg-mark tool to measure/analyze performance, but I believe the
same results can be seen without it.)

`git bisect` for this performance degradation pointed at b7b0f3f27...

[1] 
https://www.postgresql.org/message-id/b32bed1b-0746-9b20-1472-4bdc9ca66d52%40gmail.com

Best regards,
AlexanderTitle: Postgres benchmarking results

Benchmarking run started at 2024-05-14T12:04:26, ended at 2024-05-15T05:44:15Benchmark versionInstanceversionpg-src-16--1PostgreSQL 16.3-REL_16_STABLE/c1664c8eef on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitpg-src-master--1PostgreSQL 17devel-master/2e810bdb7f on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitpg-src-16--2PostgreSQL 16.3-REL_16_STABLE/c1664c8eef on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitpg-src-master--2PostgreSQL 17devel-master/2e810bdb7f on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitpg-src-master--3PostgreSQL 17devel-master/2e810bdb7f on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitpg-src-16--3PostgreSQL 16.3-REL_16_STABLE/c1664c8eef on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitBenchmark pgbench_nativeInstancetpspg-src-16--110478.49pg-src-master--110662.36pg-src-16--210491.77pg-src-master--210700.65pg-src-master--310565.92pg-src-16--310555.88Benchmark pgbench_referenceInstancelatency_​avgtps1tps2pg-src-16--18.597452.807452.84pg-src-master--18.427597.857597.88pg-src-16--28.527508.757508.79pg-src-master--28.487545.737545.77pg-src-master--38.377649.517649.55pg-src-16--38.547498.927498.95Benchmark pg_tpchInstancequery1query2query3query4query5query6query7query8query9query10query11query12query13query14query15query16query17query18query19query20query21query22pg-src-16--19.591.473.000.700.901.472.974.560.942.470.772.232.870.883.200.978.7014.410.195.842.360.23pg-src-master--19.111.542.880.700.891.402.834.750.872.390.752.132.830.893.061.008.3213.970.195.552.300.22pg-src-16--29.611.493.040.710.931.472.964.690.862.500.782.212.880.883.201.008.8214.200.205.852.360.23pg-src-master--29.051.492.850.700.871.372.824.470.872.350.772.142.670.883.030.998.2714.240.195.572.300.22pg-src-master--310.121.482.880.690.881.392.844.730.882.400.782.142.800.883.081.008.2814.180.195.562.330.22pg-src-16--39.721.473.010.700.911.462.954.670.962.500.782.202.710.873.221.008.6914.400.195.982.340.23Benchmark 

Re: Why is citext/regress failing on hamerkop?

2024-05-16 Thread Alexander Lakhin

Hello Thomas,

16.05.2024 04:32, Thomas Munro wrote:

On Thu, May 16, 2024 at 10:43 AM Thomas Munro  wrote:

Any chance you could test this version please Alexander?

Sorry, cancel that.  v3 is not good.  I assume it fixes the GSSAPI
thing and is superficially better, but it doesn't handle code that
calls twice in a row and ignores the first result (I know that
PostgreSQL does that occasionally in a few places), and it's also
broken if someone gets recv() = 0 (EOF), and then decides to wait
anyway.  The only ways I can think of to get full reliable poll()-like
semantics is to do that peek every time, OR the complicated patch
(per-socket-workspace + intercepting recv etc).  So I'm back to v2.


I've tested v2 and can confirm that it works as v1, `vcregress check`
passes with no failures on REL_16_STABLE, `meson test` with the basic
configuration too.

By the way, hamerkop is not configured to enable gssapi for HEAD [1] and
I could not enable gss locally yet (just passing extra_lib_dirs,
extra_include_dirs doesn't work for me).

It looks like we need to find a way to enable it for meson to continue
testing v17+ with GSS on Windows.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hamerkop=2024-05-12%2011%3A00%3A28=configure

Best regards,
Alexander




Re: Why is citext/regress failing on hamerkop?

2024-05-15 Thread Alexander Lakhin

15.05.2024 01:26, Thomas Munro wrote:

OK, so we know what the problem is here.  Here is the simplest
solution I know of for that problem.  I have proposed this in the past
and received negative feedback because it's a really gross hack.  But
I don't personally know what else to do about the back-branches (or
even if that complex solution is the right way forward for master).
The attached kludge at least has the [de]merit of being a mirror image
of the kludge that follows it for the "opposite" event.  Does this fix
it?


Yes, I see that abandoned GSS connections are closed immediately, as
expected. I have also confirmed that `meson test` with the basic
configuration passes on REL_16_STABLE. So from the outside, the fix
looks good to me.

Thank you for working on this!

Best regards,
Alexander




Re: Why is citext/regress failing on hamerkop?

2024-05-14 Thread Alexander Lakhin

14.05.2024 17:38, Tom Lane wrote:

As I mentioned in our off-list discussion, I have a lingering feeling
that this v14 commit could be affecting the results somehow:

Author: Tom Lane 
Branch: master Release: REL_14_BR [d5a9a661f] 2020-10-18 12:56:43 -0400

 Update the Winsock API version requested by libpq.
 
 According to Microsoft's documentation, 2.2 has been the current

 version since Windows 98 or so.  Moreover, that's what the Postgres
 backend has been requesting since 2004 (cf commit 4cdf51e64).
 So there seems no reason for libpq to keep asking for 1.1.

I didn't believe at the time that that'd have any noticeable effect,
but maybe it somehow made Winsock play a bit nicer with the GSS
support?


Yes, probably, but may be not nicer, as the test duration increased?
Still I can't see the difference locally to check that commit.
Will try other VMs/configurations, maybe I could find a missing factor...

Best regards,
Alexander




Re: Avoid orphaned objects dependencies, take 3

2024-05-14 Thread Alexander Lakhin

Hi Bertrand,

09.05.2024 15:20, Bertrand Drouvot wrote:

Oh I see, your test updates an existing dependency. v4 took care about brand new
dependencies creation (recordMultipleDependencies()) but forgot to take care
about changing an existing dependency (which is done in another code path:
changeDependencyFor()).

Please find attached v5 that adds:

- a call to the new depLockAndCheckObject() function in changeDependencyFor().
- a test when altering an existing dependency.

With v5 applied, I don't see the issue anymore.


Me too. Thank you for the improved version!
I will test the patch in the background, but for now I see no other
issues with it.

Best regards,
Alexander




Re: Why is citext/regress failing on hamerkop?

2024-05-14 Thread Alexander Lakhin

13.05.2024 23:17, Tom Lane wrote:

3. We don't know exactly why hamerkop suddenly started seeing these
failures, but a plausible theory emerges after noting that its
reported time for the successful "make check" step dropped pretty
substantially right when this started.  In the v13 branch, "make
check" was taking 2:18 or more in the several runs right before the
first isolationcheck failure, but 1:40 or less just after.  So it
looks like the animal was moved onto faster hardware.  That feeds
into this problem because, with a successful isolationcheck run
taking more than a minute, there was enough time for some of the
earlier stuck sessions to time out and exit before their
postmaster-child slots were needed.


Yes, and one thing I can't explain yet, is why REL_14_STABLE+ timings
substantially differ from REL_13_STABLE-, say, for the check stage:
REL_14_STABLE: the oldest available test log (from 2021-10-30) shows
check (00:03:47) and the newest one (from 2024-05-12): check (00:03:18).
Whilst on REL_13_STABLE the oldest available log (from 2021-08-06) shows
check duration 00:03:00, then it decreased to 00:02:24 (2021-09-22)/
00:02:14 (2021-11-07), and now it's 1:40, as you said.

Locally I see more or less the same timings on REL_13_STABLE (34, 28, 27
secs) and on REL_14_STABLE (33, 29, 29 secs).

14.05.2024 03:38, Thomas Munro wrote:

I was beginning to suspect that lingering odour myself.  I haven't
look at the GSS code but I was imagining that what we have here is
perhaps not unsent data dropped on the floor due to linger policy
(unclean socket close on process exist), but rather that the server
didn't see the socket as ready to read because it lost track of the
FD_CLOSE somewhere because the client closed it gracefully, and our
server-side FD_CLOSE handling has always been a bit suspect.  I wonder
if the GSS code is somehow more prone to brokenness.  One thing we
learned in earlier problems was that abortive/error disconnections
generate FD_CLOSE repeatedly, while graceful ones give you only one.
In other words, if the other end politely calls closesocket(), the
server had better not miss the FD_CLOSE event, because it won't come
again.   That's what

https://commitfest.postgresql.org/46/3523/

is intended to fix.  Does it help here?  Unfortunately that's
unpleasantly complicated and unbackpatchable (keeping a side-table of
socket FDs and event handles, so we don't lose events between the
cracks).


Yes, that cure helps here too. I've tested it on b282fa88d~1 (the last
state when that patch set can be applied).

An excerpt (all lines related to process 12500) from a failed run log
without the patch set:
2024-05-14 05:57:29.526 UTC [8228:128] DEBUG:  forked new backend, pid=12500 
socket=5524
2024-05-14 05:57:29.534 UTC [12500:1] [unknown] LOG:  connection received: 
host=::1 port=51394
2024-05-14 05:57:29.534 UTC [12500:2] [unknown] LOG: !!!BackendInitialize| 
before ProcessStartupPacket
2024-05-14 05:57:29.534 UTC [12500:3] [unknown] LOG: !!!ProcessStartupPacket| 
before secure_open_gssapi(), GSSok: G
2024-05-14 05:57:29.534 UTC [12500:4] [unknown] LOG: !!!secure_open_gssapi| 
before read_or_wait
2024-05-14 05:57:29.534 UTC [12500:5] [unknown] LOG: !!!read_or_wait| before 
secure_raw_read(); PqGSSRecvLength: 0, len: 4
2024-05-14 05:57:29.534 UTC [12500:6] [unknown] LOG: !!!read_or_wait| after 
secure_raw_read: -1, errno: 10035
2024-05-14 05:57:29.534 UTC [12500:7] [unknown] LOG: !!!read_or_wait| before 
WaitLatchOrSocket()
2024-05-14 05:57:29.549 UTC [12500:8] [unknown] LOG: !!!read_or_wait| after 
WaitLatchOrSocket
2024-05-14 05:57:29.549 UTC [12500:9] [unknown] LOG: !!!read_or_wait| before 
secure_raw_read(); PqGSSRecvLength: 0, len: 4
2024-05-14 05:57:29.549 UTC [12500:10] [unknown] LOG: !!!read_or_wait| after 
secure_raw_read: 0, errno: 10035
2024-05-14 05:57:29.549 UTC [12500:11] [unknown] LOG: !!!read_or_wait| before 
WaitLatchOrSocket()
...
2024-05-14 05:57:52.024 UTC [8228:3678] DEBUG:  server process (PID 12500) 
exited with exit code 1
# at the end of the test run

And an excerpt (all lines related to process 11736) from a successful run
log with the patch set applied:
2024-05-14 06:03:57.216 UTC [4524:130] DEBUG:  forked new backend, pid=11736 
socket=5540
2024-05-14 06:03:57.226 UTC [11736:1] [unknown] LOG:  connection received: 
host=::1 port=51914
2024-05-14 06:03:57.226 UTC [11736:2] [unknown] LOG: !!!BackendInitialize| 
before ProcessStartupPacket
2024-05-14 06:03:57.226 UTC [11736:3] [unknown] LOG: !!!ProcessStartupPacket| 
before secure_open_gssapi(), GSSok: G
2024-05-14 06:03:57.226 UTC [11736:4] [unknown] LOG: !!!secure_open_gssapi| 
before read_or_wait
2024-05-14 06:03:57.226 UTC [11736:5] [unknown] LOG: !!!read_or_wait| before 
secure_raw_read(); PqGSSRecvLength: 0, len: 4
2024-05-14 06:03:57.226 UTC [11736:6] [unknown] LOG: !!!read_or_wait| after 
secure_raw_read: -1, errno: 10035
2024-05-14 06:03:57.226 UTC [11736:7] [unknown] LOG: !!!read_or_wait| before 
WaitLatchOrSocket()

Re: Why is citext/regress failing on hamerkop?

2024-05-12 Thread Alexander Lakhin

Hello Tom,

12.05.2024 08:34, Tom Lane wrote:

BTW, I've also been wondering why hamerkop has been failing
isolation-check in the 12 and 13 branches for the last six months
or so.  It is surely unrelated to this issue, and it looks like
it must be due to some platform change rather than anything we
committed at the time.

I'm not planning on looking into that question myself, but really
somebody ought to.  Or is Windows just as dead as AIX, in terms of
anybody being willing to put effort into supporting it?


I've reproduced the failure locally with GSS enabled, so I'll try to
figure out what's going on here in the next few days.

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-11 Thread Alexander Lakhin

Hello Dmitry and Alexander,

Please look at one more anomaly with temporary tables:
CREATE TEMP TABLE t (a int) PARTITION BY RANGE (a);
CREATE TEMP TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1) ;
CREATE TEMP TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2);
ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0;
-- succeeds, but:
ALTER TABLE t SPLIT PARTITION tp_0 INTO
  (PARTITION tp_0 FOR VALUES FROM (0) TO (1),  PARTITION tp_1 FOR VALUES FROM 
(1) TO (2));
-- fails with:
ERROR:  relation "tp_0" already exists

Though the same SPLIT succeeds with non-temporary tables...

Best regards,
Alexander




Re: WAL record CRC calculated incorrectly because of underlying buffer modification

2024-05-10 Thread Alexander Lakhin

11.05.2024 07:25, Thomas Munro wrote:

On Sat, May 11, 2024 at 4:00 PM Alexander Lakhin  wrote:

11.05.2024 06:26, Thomas Munro wrote:

Perhaps a no-image, no-change registered buffer should not be
including an image, even for XLR_CHECK_CONSISTENCY?  It's actually
useless for consistency checking too I guess, this issue aside,
because it doesn't change anything so there is nothing to check.

Yes, I think something wrong is here. I've reduced the reproducer to:

Does it reproduce if you do this?

-   include_image = needs_backup || (info &
XLR_CHECK_CONSISTENCY) != 0;
+   include_image = needs_backup ||
+   ((info & XLR_CHECK_CONSISTENCY) != 0 &&
+(regbuf->flags & REGBUF_NO_CHANGE) == 0);


No, it doesn't (at least with the latter, more targeted reproducer).


Unfortunately the back branches don't have that new flag from 00d7fb5e
so, even if this is the right direction (not sure, I don't understand
this clean registered buffer trick) then ... but wait, why are there
are no failures like this in the back branches (yet at least)?  Does
your reproducer work for 16?  I wonder if something relevant changed
recently, like f56a9def.  CC'ing Michael and Amit K for info.


Maybe it's hard to hit (autovacuum needs to process the index page in a
narrow time frame), but locally I could reproduce the issue even on
ac27c74de(~1 too) from 2018-09-06 (I tried several last commits touching
hash indexes, didn't dig deeper).

Best regards,
Alexander




Re: WAL record CRC calculated incorrectly because of underlying buffer modification

2024-05-10 Thread Alexander Lakhin

Hello Thomas and Andres,

Thank you for looking at this!

11.05.2024 06:26, Thomas Munro wrote:

On Sat, May 11, 2024 at 3:57 AM Andres Freund  wrote:

On 2024-05-10 16:00:01 +0300, Alexander Lakhin wrote:

and discovered that XLogRecordAssemble() calculates CRC over a buffer,
that might be modified by another process.

If, with "might", you mean that it's legitimate for that buffer to be
modified, I don't think so.  The bug is that something is modifying the buffer
despite it being exclusively locked.

I.e. what we likely have here is a bug somewhere in the hash index code.

I don't have a good grip on the higher level locking protocols of
hash.c, but one microscopic thing jumps out:

 /*
  * bucket buffer was not changed, but still needs to be
  * registered to ensure that we can acquire a cleanup lock on
  * it during replay.
  */
 if (!xlrec.is_primary_bucket_page)
 {
 uint8flags = REGBUF_STANDARD |
REGBUF_NO_IMAGE | REGBUF_NO_CHANGE;

 XLogRegisterBuffer(0, bucket_buf, flags);
 }

That registers a buffer that is pinned but not content-locked, and we
tell xloginsert.c not to copy its image into the WAL, but it does it
anyway because:

 /*
  * If needs_backup is true or WAL checking is enabled for current
  * resource manager, log a full-page write for the current block.
  */
 include_image = needs_backup || (info & XLR_CHECK_CONSISTENCY) != 0;

So I guess it copies the image on dodo, which has:

 'PG_TEST_EXTRA' => 'ssl ldap
kerberos wal_consistency_checking libpq_encryption xid_wraparound'

Perhaps a no-image, no-change registered buffer should not be
including an image, even for XLR_CHECK_CONSISTENCY?  It's actually
useless for consistency checking too I guess, this issue aside,
because it doesn't change anything so there is nothing to check.


Yes, I think something wrong is here. I've reduced the reproducer to:
cat << 'EOF' | psql
CREATE TABLE hash_cleanup_heap(keycol INT);
CREATE INDEX hash_cleanup_index on hash_cleanup_heap USING HASH (keycol);

BEGIN;
INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 500) as i;
ROLLBACK;
EOF

cat << 'EOF' | psql &
INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 500) as i;

DROP TABLE hash_cleanup_heap;
EOF

cat << 'EOF' | psql &
SELECT pg_sleep(random() / 20);
VACUUM hash_cleanup_heap;
EOF
wait
grep 'TRAP:' server.log

("wal_consistency_checking = all" and the xloginsert patch are still required)

and with additional logging I see:
2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|LOG: 
!!!hashbucketcleanup| scan page buf: 1832
2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|CONTEXT: while vacuuming index "hash_cleanup_index" of 
relation "public.hash_cleanup_heap"

2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|STATEMENT:  VACUUM 
hash_cleanup_heap;
2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96de|LOG: 
!!!_hash_doinsert| _hash_getbucketbuf_from_hashkey: 1822
2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96de|STATEMENT:  INSERT INTO hash_cleanup_heap SELECT 1 FROM 
generate_series(1, 500) as i;
2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|LOG: !!!hashbucketcleanup| xlrec.is_primary_bucket_page: 0, 
buf: 1832, bucket_buf: 1822
2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|CONTEXT: while vacuuming index "hash_cleanup_index" of 
relation "public.hash_cleanup_heap"

2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|STATEMENT:  VACUUM 
hash_cleanup_heap;
2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96de|LOG: 
!!!_hash_doinsert| _hash_relbuf(rel, 1822)
2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96de|STATEMENT:  INSERT INTO hash_cleanup_heap SELECT 1 FROM 
generate_series(1, 500) as i;

TRAP: failed Assert("memcmp(block1_ptr, block1_copy, block1_len) == 0"), File: 
"xloginsert.c", Line: 949, PID: 2070237

Best regards,
Alexander




WAL record CRC calculated incorrectly because of underlying buffer modification

2024-05-10 Thread Alexander Lakhin

Hello hackers,

I've investigated a recent buildfarm failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo=2024-05-02%2006%3A40%3A36

where the test failed due to a CRC error:
2024-05-02 17:08:18.401 ACST [3406:7] LOG:  incorrect resource manager data 
checksum in record at 0/F14D7A60

(Chipmunk produced similar errors as well:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk=2022-08-25%2019%3A40%3A11
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk=2024-03-22%2003%3A20%3A39
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk=2023-08-19%2006%3A43%3A20
)

and discovered that XLogRecordAssemble() calculates CRC over a buffer,
that might be modified by another process.

With the attached patch applied, the following test run:
echo "
autovacuum_naptime = 1
autovacuum_vacuum_threshold = 1

wal_consistency_checking = all
" >/tmp/temp.config

for ((i=1;i<=100;i++)); do echo "iteration $i"; TEMP_CONFIG=/tmp/temp.config TESTS="test_setup hash_index" make 
check-tests -s || break; done


fails for me on iterations 7, 10, 17:
ok 1 - test_setup   2557 ms
not ok 2 - hash_index  24719 ms
# (test process exited with exit code 2)

postmaster.log contains:
2024-05-10 12:46:44.320 UTC checkpointer[1881151] LOG:  checkpoint starting: 
immediate force wait
2024-05-10 12:46:44.365 UTC checkpointer[1881151] LOG:  checkpoint complete: wrote 41 buffers (0.3%); 0 WAL file(s) 
added, 0 removed, 26 recycled; write=0.001 s, sync=0.001 s, total=0.046 s; sync files=0, longest=0.000 s, average=0.000 
s; distance=439134 kB, estimate=527137 kB; lsn=0/3CE131F0, redo lsn=0/3CE13198

TRAP: failed Assert("memcmp(block1_ptr, block1_copy, block1_len) == 0"), File: 
"xloginsert.c", Line: 949, PID: 1881271
ExceptionalCondition at assert.c:52:13
XLogRecordAssemble at xloginsert.c:953:1
XLogInsert at xloginsert.c:520:9
hashbucketcleanup at hash.c:844:14
hashbulkdelete at hash.c:558:3
index_bulk_delete at indexam.c:760:1
vac_bulkdel_one_index at vacuum.c:2498:10
lazy_vacuum_one_index at vacuumlazy.c:2443:10
lazy_vacuum_all_indexes at vacuumlazy.c:2026:26
lazy_vacuum at vacuumlazy.c:1944:10
lazy_scan_heap at vacuumlazy.c:1050:3
heap_vacuum_rel at vacuumlazy.c:503:2
vacuum_rel at vacuum.c:2214:2
vacuum at vacuum.c:622:8
autovacuum_do_vac_analyze at autovacuum.c:3102:2
do_autovacuum at autovacuum.c:2425:23
AutoVacWorkerMain at autovacuum.c:1569:3
pgarch_die at pgarch.c:846:1
StartChildProcess at postmaster.c:3929:5
StartAutovacuumWorker at postmaster.c:3997:12
process_pm_pmsignal at postmaster.c:3809:3
ServerLoop at postmaster.c:1667:5
BackgroundWorkerInitializeConnection at postmaster.c:4156:1
main at main.c:184:3
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fc71a8d7e40]
postgres: autovacuum worker regression(_start+0x25)[0x556a8631a5e5]
2024-05-10 12:46:45.038 UTC checkpointer[1881151] LOG:  checkpoint starting: 
immediate force wait
2024-05-10 12:46:45.965 UTC autovacuum worker[1881275] LOG: automatic analyze of table 
"regression.pg_catalog.pg_attribute"
    avg read rate: 0.000 MB/s, avg write rate: 5.409 MB/s
    buffer usage: 1094 hits, 0 misses, 27 dirtied
    system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.03 s
2024-05-10 12:46:46.892 UTC postmaster[1881150] LOG:  server process (PID 
1881271) was terminated by signal 6: Aborted
2024-05-10 12:46:46.892 UTC postmaster[1881150] DETAIL:  Failed process was running: autovacuum: VACUUM ANALYZE 
public.hash_cleanup_heap


(This can be reproduced with 027_stream_regress, of course, but it would
take more time.)

Best regards,
Alexanderdiff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 9047601534..aeb462c137 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -902,8 +902,21 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 	 */
 	INIT_CRC32C(rdata_crc);
 	COMP_CRC32C(rdata_crc, hdr_scratch + SizeOfXLogRecord, hdr_rdt.len - SizeOfXLogRecord);
+
+char block1_copy[BLCKSZ];
+char *block1_ptr = NULL;
+int block1_len = 0;
+
 	for (rdt = hdr_rdt.next; rdt != NULL; rdt = rdt->next)
+{
+	if (!block1_ptr)
+	{
+		memcpy(block1_copy, rdt->data, rdt->len);
+		block1_ptr = rdt->data;
+		block1_len = rdt->len;
+	}
 		COMP_CRC32C(rdata_crc, rdt->data, rdt->len);
+}
 
 	/*
 	 * Ensure that the XLogRecord is not too large.
@@ -930,6 +943,12 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 	rechdr->xl_prev = InvalidXLogRecPtr;
 	rechdr->xl_crc = rdata_crc;
 
+if (block1_ptr)
+{
+pg_usleep(100);
+Assert(memcmp(block1_ptr, block1_copy, block1_len) == 0);
+}
+
 	return _rdt;
 }
 


Non-systematic handling of EINTR/EAGAIN/EWOULDBLOCK

2024-05-08 Thread Alexander Lakhin

Hello hackers,

Looking at a recent failure on the buildfarm:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork=2024-04-30%2020%3A48%3A34

# poll_query_until timed out executing this query:
# SELECT archived_count FROM pg_stat_archiver
# expecting this output:
# 1
# last actual query output:
# 0
# with stderr:
# Looks like your test exited with 29 just after 4.
[23:01:41] t/020_archive_status.pl ..
Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 12/16 subtests

with the following error in the log:
2024-04-30 22:57:27.931 CEST [83115:1] LOG:  archive command failed with exit 
code 1
2024-04-30 22:57:27.931 CEST [83115:2] DETAIL:  The failed archive command was: cp 
"pg_wal/00010001_does_not_exist" "00010001_does_not_exist"

...
2024-04-30 22:57:28.070 CEST [47962:2] [unknown] LOG:  connection authorized: user=pgbf database=postgres 
application_name=020_archive_status.pl

2024-04-30 22:57:28.072 CEST [47962:3] 020_archive_status.pl LOG: statement: 
SELECT archived_count FROM pg_stat_archiver
2024-04-30 22:57:28.073 CEST [83115:3] LOG:  could not send to statistics 
collector: Resource temporarily unavailable

and the corresponding code (on REL_13_STABLE):
static void
pgstat_send(void *msg, int len)
{
    int rc;

    if (pgStatSock == PGINVALID_SOCKET)
    return;

    ((PgStat_MsgHdr *) msg)->m_size = len;

    /* We'll retry after EINTR, but ignore all other failures */
    do
    {
    rc = send(pgStatSock, msg, len, 0);
    } while (rc < 0 && errno == EINTR);

#ifdef USE_ASSERT_CHECKING
    /* In debug builds, log send failures ... */
    if (rc < 0)
    elog(LOG, "could not send to statistics collector: %m");
#endif
}

I wonder, whether this retry should be performed after EAGAIN (Resource
temporarily unavailable), EWOULDBLOCK as well.

With a simple send() wrapper (PFA) activated with LD_PRELOAD, I could
reproduce this failure easily when running
`make -s check -C src/test/recovery/ PROVE_TESTS="t/020*"` on
REL_13_STABLE:
t/020_archive_status.pl .. 1/16 # poll_query_until timed out executing this 
query:
# SELECT archived_count FROM pg_stat_archiver
# expecting this output:
# 1
# last actual query output:
# 0
# with stderr:
# Looks like your test exited with 29 just after 4.
t/020_archive_status.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 12/16 subtests

I also reproduced another failure (that lacks useful diagnostics, 
unfortunately):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork=2022-11-10%2015%3A30%3A16
...
t/020_archive_status.pl .. 8/16 # poll_query_until timed out executing this 
query:
# SELECT last_archived_wal FROM pg_stat_archiver
# expecting this output:
# 00010002
# last actual query output:
# 00010001
# with stderr:
# Looks like your test exited with 29 just after 13.
t/020_archive_status.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 3/16 subtests
...

The "n == 64" condition in the cranky send() is needed to aim exactly
these failures. Without this restriction the test (and also `make check`)
just hangs because of:
    if (errno == EINTR)
    continue;   /* Ok if we were interrupted */

    /*
 * Ok if no data writable without blocking, and the socket is in
 * non-blocking mode.
 */
    if (errno == EAGAIN ||
    errno == EWOULDBLOCK)
    {
    return 0;
    }
in internal_flush_buffer().

On the other hand, even with:
int
send(int s, const void *buf, size_t n, int flags)
{
    if (rand() % 1 == 0)
    {
    errno = EINTR;
    return -1;
    }
    return real_send(s, buf, n, flags);
}

`make check` fails with many miscellaneous errors...

Best regards,
Alexander#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 

static ssize_t (*real_send)(int s, const void *buf, size_t n, int flags) = NULL;

__attribute__((constructor))
void
lib_init(void)
{
	real_send = dlsym(RTLD_NEXT,"send");
}

int
send(int s, const void *buf, size_t n, int flags)
{
	if (n == 64 && rand() % 10 == 0)
	{
		errno = EAGAIN;
		return -1;
	}
	return real_send(s, buf, n, flags);
}


Test equivclass interferes with tests tsearch and advisory_lock

2024-05-06 Thread Alexander Lakhin

Hello hackers,

While trying to catch a sporadic regression test failure, I've discovered
that tests tsearch and advisory_lock in the parallel_schedule's group:
test: select_views ...  tsearch ... advisory_lock indirect_toast equivclass

might fail, depending on timing, because the test equivclass creates and
drops int alias operators.

With the attached patch applied to make tests in question repeatable, the
following test run fails for me:
(printf 'test: test_setup\n'; \
 printf 'test: tsearch equivclass %.0s\n' `seq 300`}) >/tmp/test_schedule;
make -s check-tests TESTS="--schedule=/tmp/test_schedule"
...
# 17 of 601 tests failed.

regression.diffs contains:
diff -U3 .../src/test/regress/expected/tsearch.out 
.../src/test/regress/results/tsearch.out
--- .../src/test/regress/expected/tsearch.out   2024-05-06 
08:11:54.892649407 +
+++ .../src/test/regress/results/tsearch.out    2024-05-06 
08:13:35.514113420 +
@@ -16,10 +16,7 @@
 WHERE prsnamespace = 0 OR prsstart = 0 OR prstoken = 0 OR prsend = 0 OR
   -- prsheadline is optional
   prslextype = 0;
- oid | prsname
--+-
-(0 rows)
-
+ERROR:  cache lookup failed for type 18517
 SELECT oid, dictname
...

Or with advisory_lock:
(printf 'test: test_setup\n'; \
 printf 'test: advisory_lock equivclass %.0s\n' `seq 300`}) >/tmp/test_schedule;
make -s check-tests TESTS="--schedule=/tmp/test_schedule"
...
# 1 of 601 tests failed.

regression.diffs contains:
diff -U3 .../src/test/regress/expected/advisory_lock.out 
.../src/test/regress/results/advisory_lock.out
--- .../src/test/regress/expected/advisory_lock.out 2024-04-10 
14:36:57.709586678 +
+++ .../src/test/regress/results/advisory_lock.out  2024-05-06 
08:15:09.235456794 +
@@ -14,40 +14,17 @@
 SELECT locktype, classid, objid, objsubid, mode, granted
    FROM pg_locks WHERE locktype = 'advisory' AND database = :datoid
    ORDER BY classid, objid, objsubid;
- locktype | classid | objid | objsubid | mode  | granted
---+-+---+--+---+-
- advisory |   0 | 1 |    1 | ExclusiveLock | t
- advisory |   0 | 2 |    1 | ShareLock | t
- advisory |   1 | 1 |    2 | ExclusiveLock | t
- advisory |   2 | 2 |    2 | ShareLock | t
-(4 rows)
-
+ERROR:  cache lookup failed for type 17976
 -- pg_advisory_unlock_all() shouldn't release xact locks
...

With backtrace_functions = 'getBaseTypeAndTypmod' specified,
I see the following stack trace of the error:
2024-05-06 08:30:51.344 UTC client backend[869479] pg_regress/tsearch ERROR:  
cache lookup failed for type 18041
2024-05-06 08:30:51.344 UTC client backend[869479] pg_regress/tsearch BACKTRACE:
getBaseTypeAndTypmod at lsyscache.c:2550:4
getBaseType at lsyscache.c:2526:1
find_coercion_pathway at parse_coerce.c:3131:18
can_coerce_type at parse_coerce.c:598:6
func_match_argtypes at parse_func.c:939:6
oper_select_candidate at parse_oper.c:327:5
oper at parse_oper.c:428:6
make_op at parse_oper.c:696:30
transformBoolExpr at parse_expr.c:1431:9
 (inlined by) transformExprRecurse at parse_expr.c:226:13
transformExpr at parse_expr.c:133:22
transformWhereClause at parse_clause.c:1867:1
transformSelectStmt at analyze.c:1382:20
 (inlined by) transformStmt at analyze.c:368:15
parse_analyze_fixedparams at analyze.c:110:15
pg_analyze_and_rewrite_fixedparams at postgres.c:691:5
 (inlined by) exec_simple_query at postgres.c:1190:20
PostgresMain at postgres.c:4680:7
BackendMain at backend_startup.c:61:2

Best regards,
Alexanderdiff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 3d5de28354..2605554380 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -1,3 +1,7 @@
+SET client_min_messages TO 'warning';
+DROP TYPE IF EXISTS int8alias1, int8alias2 CASCADE;
+DROP TABLE IF EXISTS ec0, ec1, ec2 CASCADE;
+RESET client_min_messages;
 --
 -- Tests for the planner's "equivalence class" mechanism
 --
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index cfa391ac51..6839be5f37 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -1,3 +1,6 @@
+SET client_min_messages TO 'warning';
+DROP TABLE IF EXISTS test_tsvector, test_tsquery CASCADE;
+RESET client_min_messages;
 -- directory paths are passed to us in environment variables
 \getenv abs_srcdir PG_ABS_SRCDIR
 --
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 77dd964ebf..fec4a416c8 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -1,3 +1,7 @@
+SET client_min_messages TO 'warning';
+DROP TYPE IF EXISTS int8alias1, int8alias2 CASCADE;
+DROP TABLE IF EXISTS ec0, ec1, ec2 CASCADE;
+RESET client_min_messages;
 --
 -- Tests for the planner's "equivalence class" mechanism
 --
diff --git a/src/test/regress/sql/tsearch.sql 

Re: cataloguing NOT NULL constraints

2024-05-02 Thread Alexander Lakhin

02.05.2024 19:21, Alvaro Herrera wrote:


Now, you could claim that the standard doesn't mention
INCLUDING/EXCLUDING CONSTRAINTS, therefore since we have come up with
its definition then we should make it affect not-null constraints.
However, there's also this note:

   NOTE 520 — s, except for NOT NULL, are not included in
   CDi; s are effectively transformed to s and are thereby also excluded.

which is explicitly saying that not-null constraints are treated
differently; in essence, with INCLUDING CONSTRAINTS we choose to affect
the constraints that the standard says to ignore.


Thank you for very detailed and convincing explanation!

Now I see what the last sentence here (from [1]) means:
INCLUDING CONSTRAINTS

    CHECK constraints will be copied. No distinction is made between
    column constraints and table constraints. _Not-null constraints are
    always copied to the new table._

(I hadn't paid enough attention to it, because this exact paragraph is
also presented in previous versions...)

[1] https://www.postgresql.org/docs/devel/sql-createtable.html

Best regards,
Alexander




Re: cataloguing NOT NULL constraints

2024-05-02 Thread Alexander Lakhin

Hello Alvaro,

01.05.2024 20:49, Alvaro Herrera wrote:

Here are two patches that I intend to push soon (hopefully tomorrow).



Thank you for fixing those issues!

Could you also clarify, please, how CREATE TABLE ... LIKE is expected to
work with NOT NULL constraints?

I wonder whether EXCLUDING CONSTRAINTS (ALL) should cover not-null
constraints too. What I'm seeing now, is that:
CREATE TABLE t1 (i int, CONSTRAINT nn NOT NULL i);
CREATE TABLE t2 (LIKE t1 EXCLUDING ALL);
\d+ t2
-- ends with:
Not-null constraints:
    "nn" NOT NULL "i"

Or a similar case with PRIMARY KEY:
CREATE TABLE t1 (i int PRIMARY KEY);
CREATE TABLE t2 (LIKE t1 EXCLUDING CONSTRAINTS EXCLUDING INDEXES);
\d+ t2
-- leaves:
Not-null constraints:
    "t2_i_not_null" NOT NULL "i"

Best regards,
Alexander




Re: Typos in the code and README

2024-05-02 Thread Alexander Lakhin

Hello,

28.04.2024 11:05, David Rowley wrote:


On Sat, 20 Apr 2024 at 16:09, David Rowley  wrote:

Here are a few more to see if it motivates anyone else to do a more
thorough search for another batch.

I've pushed these now.


Please look also at the list of other typos and inconsistencies I've found
on the master branch:
additional_notnulls -> old_notnulls (cf. b0e96f311)
ATAddCheckConstraint ->? ATAddCheckNNConstraint (cf. b0e96f311)
bt_page_check -> bt_target_page_check (cf. 5ae208720)
calblack_arg -> callback_arg
combinig -> combining
compaining -> complaining
ctllock - remove (cf. 53c2a97a9)
dabatase -> database
eval_const_exprs_mutator -> eval_const_expressions_mutator
ExecEndValuesScan - remove (cf. d060e921e)
get_json_nested_columns -> get_json_table_nested_columns
io_direct -> debug_io_direct
iS ->? IS (a neatnik-ism)
joing ->? join (cf. 20f90a0e4)
_jumbleRangeTblEntry - remove (cf. 367c989cd)
Mallroy -> Mallory
onstead -> instead
procedual -> procedural
psql_safe -> safe_psql
read_quoted_pattern -> read_quoted_string
repertiore -> repertoire
rightfirstdataoffset ->? rightfirstoffset (cf. 5ae208720)
Sincle ->? Since (perhaps the whole sentence could be rephrased)
sslnegotition=direct/requiredirectwould -> sslnegotiation=direct/requiredirect 
would
sslnegotitation -> sslnegotiation
walreciever -> walreceiver
xid_wrapround -> xid_wraparound

(some of them are located in doc/, so it's not a code-only change)
I've attached the patch for your convenience, though maybe some
of the suggestions are to be discarded.

Best regards,
Alexanderdiff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 20da4a46ba..70f65b645a 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -1086,7 +1086,7 @@ bt_entry_unique_check(BtreeCheckState *state, IndexTuple itup,
 
 	/*
 	 * Current tuple has no posting list. If TID is visible save info about it
-	 * for the next comparisons in the loop in bt_page_check(). Report
+	 * for the next comparisons in the loop in bt_target_page_check(). Report
 	 * duplicate if lVis_tid is already valid.
 	 */
 	else
@@ -1953,7 +1953,7 @@ bt_target_page_check(BtreeCheckState *state)
  * Note that !readonly callers must reverify that target page has not
  * been concurrently deleted.
  *
- * Save rightfirstdataoffset for detailed error message.
+ * Save rightfirstoffset for detailed error message.
  */
 static BTScanInsert
 bt_right_page_check_scankey(BtreeCheckState *state, OffsetNumber *rightfirstoffset)
diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index d7e78204ad..f4b00399e4 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -331,7 +331,7 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
  xid_wraparound
  
   
-   Runs the test suite under src/test/module/xid_wrapround.
+   Runs the test suite under src/test/module/xid_wraparound.
Not enabled by default because it is resource intensive.
   
  
diff --git a/doc/src/sgml/targets-meson.txt b/doc/src/sgml/targets-meson.txt
index 1c42dd3028..bd470c87a7 100644
--- a/doc/src/sgml/targets-meson.txt
+++ b/doc/src/sgml/targets-meson.txt
@@ -11,7 +11,7 @@ Code Targets:
   backend   Build backend and related modules
   bin   Build frontend binaries
   contrib   Build contrib modules
-  plBuild procedual languages
+  plBuild procedural languages
 
 Developer Targets:
   reformat-dat-filesRewrite catalog data files into standard format
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index bf28400dd8..d0f3848c95 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2598,7 +2598,7 @@ _brin_parallel_heapscan(BrinBuildState *state)
  *
  * After waiting for all workers to finish, merge the per-worker results into
  * the complete index. The results from each worker are sorted by block number
- * (start of the page range). While combinig the per-worker results we merge
+ * (start of the page range). While combining the per-worker results we merge
  * summaries for the same page range, and also fill-in empty summaries for
  * ranges without any tuples.
  *
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index f99ec38a4a..77b05cc0a7 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -228,7 +228,6 @@ SimpleLruAutotuneBuffers(int divisor, int max)
  * name: name of SLRU.  (This is user-visible, pick with care!)
  * nslots: number of page slots to use.
  * nlsns: number of LSN groups per page (set to zero if not relevant).
- * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
  * buffer_tranche_id: tranche ID to use for the SLRU's 

Re: Removing unneeded self joins

2024-05-01 Thread Alexander Lakhin

30.04.2024 13:20, Alexander Korotkov wrote:

On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin  wrote:

I've discovered another failure, introduced by d3d55ce57.
Please try the following:
CREATE TABLE t (a int unique, b float);
SELECT * FROM t NATURAL JOIN LATERAL
   (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;

I think we should just forbid SJE in case when relations to be merged
have cross-references with lateral vars.  The draft patch for this is
attached.  I'd like to ask Alexander to test it, Richard and Andrei to
review it.  Thank you!


Beside LATERAL vars, it seems that SJR doesn't play well with TABLESAMPLE
in general. For instance:
CREATE TABLE t (a int unique);
INSERT INTO t SELECT * FROM generate_series (1,100);

SELECT COUNT(*) FROM (SELECT * FROM t TABLESAMPLE BERNOULLI(1)) t1
 NATURAL JOIN (SELECT * FROM t TABLESAMPLE BERNOULLI(100)) t2;
returned 100, 100, 100 for me, though with enable_self_join_removal = off,
I got 4, 0, 1...

Best regards,
Alexander




Re: Avoid orphaned objects dependencies, take 3

2024-04-30 Thread Alexander Lakhin

Hi Bertrand,

25.04.2024 10:20, Bertrand Drouvot wrote:

postgres=# CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN f() + 1;
ERROR:  cache lookup failed for function 16400

This stuff does appear before we get a chance to call the new 
depLockAndCheckObject()
function.

I think this is what Tom was referring to in [1]:

"
So the only real fix for this would be to make every object lookup in the entire
system do the sort of dance that's done in RangeVarGetRelidExtended.
"

The fact that those kind of errors appear also somehow ensure that no orphaned
dependencies can be created.


I agree; the only thing that I'd change here, is the error code.

But I've discovered yet another possibility to get a broken dependency.
Please try this script:
res=0
numclients=20
for ((i=1;i<=100;i++)); do
for ((c=1;c<=numclients;c++)); do
  echo "
CREATE SCHEMA s_$c;
CREATE CONVERSION myconv_$c FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
ALTER CONVERSION myconv_$c SET SCHEMA s_$c;
  " | psql >psql1-$c.log 2>&1 &
  echo "DROP SCHEMA s_$c RESTRICT;" | psql >psql2-$c.log 2>&1 &
done
wait
pg_dump -f db.dump || { echo "on iteration $i"; res=1; break; }
for ((c=1;c<=numclients;c++)); do
  echo "DROP SCHEMA s_$c CASCADE;" | psql >psql3-$c.log 2>&1
done
done
psql -c "SELECT * FROM pg_conversion WHERE connamespace NOT IN (SELECT oid FROM 
pg_namespace);"

It fails for me (with the v4 patch applied) as follows:
pg_dump: error: schema with OID 16392 does not exist
on iteration 1
  oid  | conname  | connamespace | conowner | conforencoding | contoencoding |  
    conproc  | condefault
---+--+--+--++---+---+
 16396 | myconv_6 |    16392 |   10 |  8 | 6 | 
iso8859_1_to_utf8 | f

Best regards,
Alexander




Re: Removing unneeded self joins

2024-04-30 Thread Alexander Lakhin

Hello Alexander,

23.10.2023 12:47, Alexander Korotkov wrote:

I think this patch makes substantial improvement to query planning.
It has received plenty of reviews.  The code is currently in quite
good shape.  I didn't manage to find the cases when this optimization
causes significant overhead to planning time.  Even if such cases will
be spotted there is a GUC option to disable this feature.  So, I'll
push this if there are no objections.


I've discovered another failure, introduced by d3d55ce57.
Please try the following:
CREATE TABLE t (a int unique, b float);
SELECT * FROM t NATURAL JOIN LATERAL
 (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;

With asserts enabled, it triggers
TRAP: failed Assert("!bms_is_member(rti, lateral_relids)"), File: 
"initsplan.c", Line: 697, PID: 3074054
ExceptionalCondition at assert.c:52:13
create_lateral_join_info at initsplan.c:700:8
query_planner at planmain.c:257:2
grouping_planner at planner.c:1523:17
subquery_planner at planner.c:1098:2
standard_planner at planner.c:415:9
planner at planner.c:282:12
pg_plan_query at postgres.c:904:9
pg_plan_queries at postgres.c:996:11
exec_simple_query at postgres.c:1193:19
PostgresMain at postgres.c:4684:27

With no asserts, I get:
ERROR:  failed to construct the join relation

Please take a look at this.

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-29 Thread Alexander Lakhin

30.04.2024 03:10, Dmitry Koval wrote:

Hi!

1.
29.04.2024 21:00, Alexander Lakhin wrote:

I still wonder, why that constraint (now with a less questionable name) is
created during MERGE?


The SPLIT/MERGE PARTITION(S) commands for creating partitions reuse the existing code of CREATE TABLE .. LIKE ... 
command. A new partition was created with the name "merge-16385-26BCB0-tmp" (since there was an old partition with the 
same name). The constraint "merge-16385-26BCB0-tmp_i_not_null" was created too together with the partition. 
Subsequently, the table was renamed, but the constraint was not.

Now a new partition is immediately created with the correct name (the old 
partition is renamed).


Maybe I'm doing something wrong, but the following script:
CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i);
CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1);
CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2);

CREATE TABLE t2 (LIKE t INCLUDING ALL);
CREATE TABLE tp2 (LIKE tp_0 INCLUDING ALL);
creates tables t2, tp2 without not-null constraints.

But after
ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0;
I see:
\d+ tp_0
...
Indexes:
    "tp_0_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
    "tp_0_i_not_null" NOT NULL "i"

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-29 Thread Alexander Lakhin

Hi Dmitry,

19.04.2024 02:26, Dmitry Koval wrote:


18.04.2024 19:00, Alexander Lakhin wrote:

leaves a strange constraint:
\d+ t*
   Table "public.tp_0"
...
Not-null constraints:
 "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i"


Thanks!
Attached fix (with test) for this case.
The patch should be applied after patches
v6-0001- ... .patch ... v6-0004- ... .patch


I still wonder, why that constraint (now with a less questionable name) is
created during MERGE?

That is, before MERGE, two partitions have only PRIMARY KEY indexes,
with no not-null constraint, and you can manually remove the constraint
after MERGE, so maybe it's not necessary...

Best regards,
Alexander




Re: A failure in prepared_xacts test

2024-04-28 Thread Alexander Lakhin

Hello Tom and Michael,

29.04.2024 08:11, Tom Lane wrote:

Michael Paquier  writes:

If you grep the source tree, you'd notice that a prepared transaction
named gxid only exists in the 2PC tests of ECPG, in
src/interfaces/ecpg/test/sql/twophase.pgc.  So the origin of the
failure comes from a race condition due to test parallelization,
because the scan of pg_prepared_xacts affects all databases with
installcheck, and in your case it means that the scan of
pg_prepared_xacts was running in parallel of the ECPG tests with an
installcheck.

Up to now, we've only worried about whether tests running in parallel
within a single test suite can interact.  It's quite scary to think
that the meson setup has expanded the possibility of interactions
to our entire source tree.  Maybe that was a bad idea and we should
fix the meson infrastructure to not do that.  I fear that otherwise,
we'll get bit regularly by very-low-probability bugs of this kind.


Yes, I'm afraid of the same. For example, the test failure [1] is of that
ilk, I guess.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual=2024-04-17%2016%3A33%3A23

Best regards,
Alexander




Re: A failure in prepared_xacts test

2024-04-28 Thread Alexander Lakhin

Hello Richard,

29.04.2024 04:12, Richard Guo wrote:

Does anyone have any clue to this failure?

FWIW, after another run of this test, the failure just disappears.  Does
it suggest that the test case is flaky?



I think this could be caused by the ecpg test twophase executed
simultaneously with the test prepared_xacts thanks to meson's jobs
parallelization.

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-28 Thread Alexander Lakhin

Hello,

28.04.2024 03:59, Alexander Korotkov wrote:

The revised patchset is attached.  I'm going to push it if there are
no objections.


I have one additional question regarding security, if you don't mind:
What permissions should a user have to perform split/merge?

When we deal with mixed ownership, say, bob is an owner of a
partitioned table, but not an owner of a partition, should we
allow him to perform merge with that partition?
Consider the following script:
CREATE ROLE alice;
GRANT CREATE ON SCHEMA public TO alice;

SET SESSION AUTHORIZATION alice;
CREATE TABLE t (i int PRIMARY KEY, t text, u text) PARTITION BY RANGE (i);
CREATE TABLE tp_00 PARTITION OF t FOR VALUES FROM (0) TO (10);
CREATE TABLE tp_10 PARTITION OF t FOR VALUES FROM (10) TO (20);

CREATE POLICY p1 ON tp_00 USING (u = current_user);
ALTER TABLE tp_00 ENABLE ROW LEVEL SECURITY;

INSERT INTO t(i, t, u)  VALUES (0, 'info for bob', 'bob');
INSERT INTO t(i, t, u)  VALUES (1, 'info for alice', 'alice');
RESET SESSION AUTHORIZATION;

CREATE ROLE bob;
GRANT CREATE ON SCHEMA public TO bob;
ALTER TABLE t OWNER TO bob;
GRANT SELECT ON TABLE tp_00 TO bob;

SET SESSION AUTHORIZATION bob;
SELECT * FROM tp_00;
--- here bob can see his info only
\d
 Schema | Name  |   Type    | Owner
+---+---+---
 public | t | partitioned table | bob
 public | tp_00 | table | alice
 public | tp_10 | table | alice

-- but then bob can do:
ALTER TABLE t MERGE PARTITIONS (tp_00, tp_10) INTO tp_00;
-- (yes, he also can detach the partition tp_00, but then he couldn't
-- re-attach nor read it)

\d
 Schema | Name  |   Type    | Owner
+---+---+---
 public | t | partitioned table | bob
 public | tp_00 | table | bob

Thus bob effectively have captured the partition with the data.

What do you think, does this create a new security risk?

Best regards,
Alexander




Re: partitioning and identity column

2024-04-28 Thread Alexander Lakhin

27.04.2024 18:00, Alexander Lakhin wrote:


Please look also at another script, which produces the same error:


I've discovered yet another problematic case:
CREATE TABLE tbl1 (a int GENERATED ALWAYS AS IDENTITY, b text)
    PARTITION BY LIST (a);
CREATE TABLE tbl2 (b text, a int NOT NULL);
ALTER TABLE tbl1 ATTACH PARTITION tbl2 DEFAULT;

INSERT INTO tbl2 DEFAULT VALUES;
ERROR:  no owned sequence found

Though it works with tbl2(a int NOT NULL, b text)...
Take a look at this too, please.

Best regards,
Alexander




Re: partitioning and identity column

2024-04-27 Thread Alexander Lakhin

Hello Ashutosh,

26.04.2024 21:00, Alexander Lakhin wrote:

26.04.2024 15:57, Ashutosh Bapat wrote:

Thanks Alexander for the report.

On Fri, Apr 26, 2024 at 5:30 PM Alexander Lakhin  wrote:


CREATE TABLE tbl3 (LIKE tbl2 INCLUDING IDENTITY);
ERROR:  no owned sequence found






Do you want to track this in open items?



If you are inclined to fix this behavior,  I would add this item.



Please look also at another script, which produces the same error:
CREATE TABLE tbl1 (a int GENERATED BY DEFAULT AS IDENTITY, b text)
   PARTITION BY LIST (b);
CREATE TABLE tbl2 PARTITION OF tbl1 DEFAULT;

ALTER TABLE tbl1 ALTER COLUMN a SET DATA TYPE bigint;
ERROR:  no owned sequence found

(On 699586315~1, it executes successfully and changes the data type of the
identity column and it's sequence.)

Best regards,
Alexander

Re: partitioning and identity column

2024-04-26 Thread Alexander Lakhin

26.04.2024 15:57, Ashutosh Bapat wrote:

Thanks Alexander for the report.

On Fri, Apr 26, 2024 at 5:30 PM Alexander Lakhin  wrote:


CREATE TABLE tbl3 (LIKE tbl2 INCLUDING IDENTITY);
ERROR:  no owned sequence found


I don't think creating a table like a partition is common or even useful. Usually it would create it from partitithe 
oned table. But if we consider that to be a use case, I think the error is expected since a partition doesn't have its 
own identity; it shares it with the partitioned table. Maybe we could give a better message. But I will look into this 
and fix it if the solution makes sense.


Maybe it's uncommon, but it's allowed, so users may want to
CREATE TABLE sometable (LIKE partX INCLUDING ALL), for example, if the
partition has a somewhat different structure. And thinking about how such
a restriction could be described in the docs, I would prefer to avoid this
error at the implementation level.



Do you want to track this in open items?



If you are inclined to fix this behavior,  I would add this item.

Best regards,
Alexander

Re: partitioning and identity column

2024-04-26 Thread Alexander Lakhin

Hello Ashutosh and Peter,

16.01.2024 21:59, Peter Eisentraut wrote:

On 09.01.24 15:10, Ashutosh Bapat wrote:

Here's complete patch-set.


Looks good!  Committed.



Please take a look at a new error case introduced by 699586315:
CREATE TABLE tbl1 (a int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY)
  PARTITION BY LIST (a);
CREATE TABLE tbl2 PARTITION OF tbl1 DEFAULT;

CREATE TABLE tbl3 (LIKE tbl2 INCLUDING IDENTITY);
ERROR:  no owned sequence found

Best regards,
Alexander




Re: Avoid orphaned objects dependencies, take 3

2024-04-25 Thread Alexander Lakhin

Hi,

25.04.2024 08:00, Bertrand Drouvot wrote:



though concurrent create/drop operations can still produce
the "cache lookup failed" error, which is probably okay, except that it is
an INTERNAL_ERROR, which assumed to be not easily triggered by users.

I did not see any of those "cache lookup failed" during my testing with/without
your script. During which test(s) did you see those with v3 applied?


You can try, for example, table-trigger, or other tests that check for
"cache lookup failed" psql output only (maybe you'll need to increase the
iteration count). For instance, I've got (with v4 applied):
2024-04-25 05:48:08.102 UTC [3638763] ERROR:  cache lookup failed for function 
20893
2024-04-25 05:48:08.102 UTC [3638763] STATEMENT:  CREATE TRIGGER modified_c1 
BEFORE UPDATE OF c ON t
    FOR EACH ROW WHEN (OLD.c <> NEW.c) EXECUTE PROCEDURE 
trigger_func('modified_c');

Or with function-function:
2024-04-25 05:52:31.390 UTC [3711897] ERROR:  cache lookup failed for function 
32190 at character 54
2024-04-25 05:52:31.390 UTC [3711897] STATEMENT:  CREATE FUNCTION f1() RETURNS 
int LANGUAGE SQL RETURN f() + 1;
--
2024-04-25 05:52:37.639 UTC [3720011] ERROR:  cache lookup failed for function 
34465 at character 54
2024-04-25 05:52:37.639 UTC [3720011] STATEMENT:  CREATE FUNCTION f1() RETURNS 
int LANGUAGE SQL RETURN f() + 1;


Attached v4, simply adding more tests to v3.


Thank you for working on this!

Best regards,
Alexander

Re: cataloguing NOT NULL constraints

2024-04-24 Thread Alexander Lakhin

24.04.2024 20:36, Alvaro Herrera wrote:

So I added a restriction that we only accept such a change when
recursively adding a constraint, or during binary upgrade.  This should
limit the damage: you're no longer able to change an existing constraint
from NO INHERIT to YES INHERIT merely by doing another ALTER TABLE ADD
CONSTRAINT.

One thing that has me a little nervous about this whole business is
whether we're set up to error out where some child table down the
hierarchy has nulls, and we add a not-null constraint to it but fail to
do a verification scan.  I tried a couple of cases and AFAICS it works
correctly, but maybe there are other cases I haven't thought about where
it doesn't.



Thank you for the fix!

While studying the NO INHERIT option, I've noticed that the documentation
probably misses it's specification for NOT NULL:
https://www.postgresql.org/docs/devel/sql-createtable.html

where column_constraint is:
...
[ CONSTRAINT constraint_name ]
{ NOT NULL |
  NULL |
  CHECK ( expression ) [ NO INHERIT ] |

Also, I've found a weird behaviour with a non-inherited NOT NULL
constraint for a partitioned table:
CREATE TABLE pt(a int NOT NULL NO INHERIT) PARTITION BY LIST (a);
CREATE TABLE dp(a int NOT NULL);
ALTER TABLE pt ATTACH PARTITION dp DEFAULT;
ALTER TABLE pt DETACH PARTITION dp;
fails with:
ERROR:  relation 16389 has non-inherited constraint "dp_a_not_null"

Though with an analogous check constraint, I get:
CREATE TABLE pt(a int, CONSTRAINT nna CHECK (a IS NOT NULL) NO INHERIT) 
PARTITION BY LIST (a);
ERROR:  cannot add NO INHERIT constraint to partitioned table "pt"

Best regards,
Alexander




Re: Avoid orphaned objects dependencies, take 3

2024-04-24 Thread Alexander Lakhin

Hi Bertrand,

24.04.2024 11:38, Bertrand Drouvot wrote:

Please find attached v2 that should not produce the issue anymore (I launched a
lot of attempts without any issues). v1 was not strong enough as it was not
always checking for the dependent object existence. v2 now always checks if the
object still exists after the additional lock acquisition attempt while 
recording
the dependency.

I still need to think about v2 but in the meantime could you please also give
v2 a try on you side?

I gave more thought to v2 and the approach seems reasonable to me. Basically 
what
it does is that in case the object is already dropped before we take the new 
lock
(while recording the dependency) then the error message is a "generic" one 
(means
it does not provide the description of the "already" dropped object). I think it
makes sense to write the patch that way by abandoning the patch's ambition to
tell the description of the dropped object in all the cases.

Of course, I would be happy to hear others thought about it.

Please find v3 attached (which is v2 with more comments).


Thank you for the improved version!

I can confirm that it fixes that case.
I've also tested other cases that fail on master (most of them also fail
with v1), please try/look at the attached script. (There could be other
broken-dependency cases, of course, but I think I've covered all the
representative ones.)

All tested cases work correctly with v3 applied — I couldn't get broken
dependencies, though concurrent create/drop operations can still produce
the "cache lookup failed" error, which is probably okay, except that it is
an INTERNAL_ERROR, which assumed to be not easily triggered by users.

Best regards,
Alexanderif [ "$1" == "function-schema" ]; then
res=1
for ((i=1;i<=300;i++)); do
  ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) 
>psql1.log 2>&1 & 
  echo "
CREATE SCHEMA s;
CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;';
CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;';
CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;';
CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;';
CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;';
  "  | psql >psql2.log 2>&1 &
  wait
  psql -c "DROP SCHEMA s CASCADE" >psql3.log 2>&1
done
echo "
SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc 
pp
  LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | 
psql
sleep 1
grep 'was terminated' server.log || res=0
fi


if [ "$1" == "function-function" ]; then
res=1
for ((i=1;i<=1000;i++)); do
( { for ((n=1;n<=20;n++)); do echo "DROP FUNCTION f();"; done } | psql ) 
>psql1.log 2>&1 &
( echo "
CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN 1;
CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;
CREATE FUNCTION f2() RETURNS int LANGUAGE SQL RETURN f() + 2;
CREATE FUNCTION f3() RETURNS int LANGUAGE SQL RETURN f() + 3;
CREATE FUNCTION f4() RETURNS int LANGUAGE SQL RETURN f() + 4;
CREATE FUNCTION f5() RETURNS int LANGUAGE SQL RETURN f() + 5;
" | psql ) >psql2.log 2>&1 &
wait
echo "ALTER FUNCTION f1() RENAME TO f01; SELECT f01()" | psql 2>&1 | grep 
'cache lookup failed' && { echo "on iteration $i"; res=1; break; }

psql -c "DROP FUNCTION f() CASCADE" >psql3.log 2>&1
done

grep -A2 'cache lookup failed' server.log || res=0
fi


if [ "$1" == "function-rettype" ]; then
res=0
for ((i=1;i<=5000;i++)); do
( echo "
CREATE DOMAIN id AS int;
CREATE FUNCTION f1() RETURNS id LANGUAGE SQL RETURN 1;
CREATE FUNCTION f2() RETURNS id LANGUAGE SQL RETURN 2;
CREATE FUNCTION f3() RETURNS id LANGUAGE SQL RETURN 3;
CREATE FUNCTION f4() RETURNS id LANGUAGE SQL RETURN 4;
CREATE FUNCTION f5() RETURNS id LANGUAGE SQL RETURN 5;
" | psql ) >psql1.log 2>&1 &
( echo "SELECT pg_sleep(random() / 100); DROP DOMAIN id"  | psql ) >psql2.log 
2>&1 &
wait

psql -c "SELECT f1()" 2>&1 | grep 'cache lookup failed' && { echo "on iteration 
$i"; res=1; break; }
psql -c "DROP DOMAIN id CASCADE" >psql3.log 2>&1
done

[ $res == 1 ] && psql -c "SELECT pn.oid, proname, pronamespace, proowner, 
prolang, prorettype FROM pg_proc pp INNER JOIN pg_namespace pn ON 
(pp.pronamespace = pn.oid) WHERE pn.nspname='public'"
fi


if [ "$1" == "function-argtype" ]; then
res=0
for ((i=1;i<=5000;i++)); do
( echo "
CREATE DOMAIN id AS int;
CREATE FUNCTION f1(i id) RETURNS int LANGUAGE SQL RETURN 1;
CREATE FUNCTION f2(i id) RETURNS int LANGUAGE SQL RETURN 2;
CREATE FUNCTION f3(i id) RETURNS int LANGUAGE SQL RETURN 3;
CREATE FUNCTION f4(i id) RETURNS int LANGUAGE SQL RETURN 4;
CREATE FUNCTION f5(i id) RETURNS int LANGUAGE SQL RETURN 5;
" | psql ) >psql1.log 2>&1 &
( echo "SELECT pg_sleep(random() / 1000); DROP DOMAIN id" | psql ) >psql2.log 
2>&1 &
wait

psql -c "SELECT f1(1)" 2>&1 | grep 'cache lookup failed' && { echo "on 
iteration $i"; res=1; break; }
psql -q -c "DROP DOMAIN id CASCADE" >/dev/null 2>&1
done

grep -A1 'cache lookup failed' server.log || res=0
fi

Re: Avoid orphaned objects dependencies, take 3

2024-04-22 Thread Alexander Lakhin

22.04.2024 13:52, Bertrand Drouvot wrote:


That's weird, I just launched it several times with the patch applied and I'm 
not
able to see the seg fault (while I can see it constently failing on the master
branch).

Are you 100% sure you tested it against a binary with the patch applied?



Yes, at least I can't see what I'm doing wrong. Please try my
self-contained script attached.

Best regards,
Alexander#!/bin/bash

git reset --hard HEAD; git clean -dfx >/dev/null
wget 
https://www.postgresql.org/message-id/attachment/159685/v1-0001-Avoid-orphaned-objects-dependencies.patch
git apply v1-0001-Avoid-orphaned-objects-dependencies.patch || exit 1
CPPFLAGS="-Og" ./configure --enable-debug --enable-cassert -q && make -j8 -s && 
TESTS="test_setup" make -s check-tests

PGROOT=`pwd`/tmp_install
export PGDATA=`pwd`/tmpdb

export PATH="$PGROOT/usr/local/pgsql/bin:$PATH"
export LD_LIBRARY_PATH="$PGROOT/usr/local/pgsql/lib"

initdb >initdb.log 2>&1

export PGPORT=55432
echo "
port=$PGPORT
fsync = off
" > $PGDATA/postgresql.auto.conf

pg_ctl -l server.log start
export PGDATABASE=regression

createdb regression

for ((i=1;i<=300;i++)); do
  ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) 
>psql1.log 2>&1 & 
  echo "
CREATE SCHEMA s;
CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;';
CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;';
CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;';
CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;';
CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;';
  "  | psql >psql2.log 2>&1 &
  wait
  psql -c "DROP SCHEMA s CASCADE" >psql3.log
done
echo "
SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc 
pp
  LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | 
psql

pg_ctl -w -t 5 -D "$PGDATA" stop


Re: Avoid orphaned objects dependencies, take 3

2024-04-22 Thread Alexander Lakhin

Hi Bertrand,

22.04.2024 11:45, Bertrand Drouvot wrote:

Hi,

This new thread is a follow-up of [1] and [2].

Problem description:

We have occasionally observed objects having an orphaned dependency, the
most common case we have seen is functions not linked to any namespaces.

...

Looking forward to your feedback,


This have reminded me of bug #17182 [1].
Unfortunately, with the patch applied, the following script:

for ((i=1;i<=100;i++)); do
  ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) >psql1.log 
2>&1 &
  echo "
CREATE SCHEMA s;
CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;';
CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;';
CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;';
CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;';
CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;';
  "  | psql >psql2.log 2>&1 &
  wait
  psql -c "DROP SCHEMA s CASCADE" >psql3.log
done
echo "
SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc 
pp
  LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | 
psql

still ends with:
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|LOG:  server process (PID 1388378) was terminated by signal 11: 
Segmentation fault
2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|DETAIL:  Failed process was running: SELECT 
pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp

  LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL

[1] 
https://www.postgresql.org/message-id/flat/17182-a6baa001dd1784be%40postgresql.org

Best regards,
Alexander




Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

2024-04-22 Thread Alexander Lakhin

Hello Peter,

07.04.2024 20:18, Peter Geoghegan wrote:

On Sun, Apr 7, 2024 at 1:00 PM Alexander Lakhin  wrote:

SELECT * FROM t WHERE a < ANY (ARRAY[1]) AND b < ANY (ARRAY[1]);

TRAP: failed Assert("so->numArrayKeys"), File: "nbtutils.c", Line: 560, PID: 
3251267

I immediately see what's up here. WIll fix this in the next short
while. There is no bug here in builds without assertions, but it's
probably best to keep the assertion, and to just make sure not to call
_bt_preprocess_array_keys_final() unless it has real work to do.


Please look at another case, where a similar Assert (but this time in
_bt_preprocess_keys()) is triggered:
CREATE TABLE t (a text, b text);
INSERT INTO t (a, b) SELECT 'a', repeat('b', 100) FROM generate_series(1, 500) 
g;
CREATE INDEX t_idx ON t USING btree(a);
BEGIN;
DECLARE c CURSOR FOR SELECT a FROM t WHERE a = 'a';
FETCH FROM c;
FETCH RELATIVE 0 FROM c;

TRAP: failed Assert("so->numArrayKeys"), File: "nbtutils.c", Line: 2582, PID: 
1130962

Best regards,
Alexander




Re: fix tablespace handling in pg_combinebackup

2024-04-22 Thread Alexander Lakhin

22.04.2024 00:49, Thomas Munro wrote:

On Mon, Apr 22, 2024 at 12:00 AM Alexander Lakhin  wrote:

  From what I can see, the following condition (namely, -l):
  if ($path =~ /^pg_tblspc\/(\d+)$/ && -l "$backup_path/$path")
  {
  push @tsoids, $1;
  return 0;
  }

is false for junction points on Windows (cf [1]), but the target path is:

Ah, yes, right, -l doesn't like junction points.  Well, we're already
using the Win32API::File package (it looks like a CPAN library, but I
guess the Windows perl distros like Strawberry are all including it
for free?).  See PostgreSQL::Test::Utils::is_symlink(), attached.
That seems to work as expected, but someone who actually knows perl
can surely make it better.  Then I hit the next problem:

readlink 
C:\cirrus\build/testrun/pg_combinebackup/002_compare_backups\data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc/16415:
Inappropriate I/O control operation at
C:/cirrus/src/test/perl/PostgreSQL/Test/Cluster.pm line 927.

https://cirrus-ci.com/task/5162332353986560

I don't know where exactly that error message is coming from, but
assuming that Strawberry Perl contains this code:

https://github.com/Perl/perl5/blob/f936cd91ee430786a1bb6068a4a7c8362610dd5f/win32/win32.c#L2041
https://github.com/Perl/perl5/blob/f936cd91ee430786a1bb6068a4a7c8362610dd5f/win32/win32.c#L1976

... then it's *very* similar to what we're doing in our own
pgreadlink() code.  I wondered if the buffer size might be too small
for our path, but it doesn't seem so:

https://github.com/Perl/perl5/blob/f936cd91ee430786a1bb6068a4a7c8362610dd5f/win32/win32.c#L1581C1-L1581C35

(I think MAX_PATH is 256 on Windows.)

If there is some unfixable problem with what they're doing in their
readlink(), then I guess it should be possible to read the junction
point directly in Perl using Win32API::File::DeviceIoControl()... but
I can't see what's wrong with it!  Maybe it's not the same code?


I wonder whether the target path (\??\) of that junction point is fully correct.
I tried:
> mklink /j 
"C:/src/postgresql/build/testrun/pg_combinebackup/002_compare_backups/data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc/test" 
\\?\C:\t1
Junction created for 
C:/src/postgresql/build/testrun/pg_combinebackup/002_compare_backups/data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc/test 
<<===>> \\?\C:\t1

and
my $path = 
'C:/src/postgresql/build/testrun/pg_combinebackup/002_compare_backups/data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc/test';

my $result = readlink($path);
works for me:
result: \\?\C:\t1

Whilst with:
my $path = 
'C:/src/postgresql/build/testrun/pg_combinebackup/002_compare_backups/data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc/16415';

readlink() fails with "Invalid argument".

> dir 
"C:/src/postgresql/build/testrun/pg_combinebackup/002_compare_backups/data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc"

04/22/2024  08:16 AM      .
04/22/2024  08:16 AM      ..
04/22/2024  06:52 AM     16415 
[\??\C:\Users\ADMINI~1\AppData\Local\Temp\1zznr8FW5N\ts1backup]
04/22/2024  08:16 AM     test [\\?\C:\t1]

> dir "\??\C:\Users\ADMINI~1\AppData\Local\Temp\1zznr8FW5N\ts1backup"
 Directory of C:\??\C:\Users\ADMINI~1\AppData\Local\Temp\1zznr8FW5N\ts1backup

File Not Found

> dir "\\?\C:\t1"
 Directory of \\?\C:\t1

04/22/2024  08:06 AM      .
04/22/2024  08:06 AM      ..
   0 File(s)  0 bytes

Though
> dir 
"C:/src/postgresql/build/testrun/pg_combinebackup/002_compare_backups/data/t_002_compare_backups_primary_data/backup/backup1/pg_tblspc/16415"

somehow really works:
 Directory of 
C:\src\postgresql\build\testrun\pg_combinebackup\002_compare_backups\data\t_002_compare_backups_primary_data\backup\backup1\pg_tblspc\16415


04/22/2024  06:52 AM      .
04/22/2024  06:52 AM      ..
04/22/2024  06:52 AM      PG_17_202404021

Best regards,
Alexander




Re: fix tablespace handling in pg_combinebackup

2024-04-21 Thread Alexander Lakhin

Hello Thomas and Robert,

20.04.2024 08:56, Thomas Munro wrote:

... it still broke[4].  So I'm not sure what's going on...



From what I can see, the following condition (namely, -l):
    if ($path =~ /^pg_tblspc\/(\d+)$/ && -l "$backup_path/$path")
    {
    push @tsoids, $1;
    return 0;
    }

is false for junction points on Windows (cf [1]), but the target path is:
 Directory of 
C:\src\postgresql\build\testrun\pg_combinebackup\002_compare_backups\data\t_002_compare_backups_primary_data\backup\backup1\pg_tblspc


04/21/2024  02:05 PM      .
04/21/2024  02:05 PM      ..
04/21/2024  02:05 PM     16415 
[\??\C:\Users\ADMINI~1\AppData\Local\Temp\xXMfNDMCot\ts1backup]

[1] https://www.perlmonks.org/?node_id=1223819

Best regards,
Alexander




Performance of JSON_TABLE vs jsonb_to_recordset

2024-04-20 Thread Alexander Lakhin

Hello hackers,

When playing with JSON_TABLE, I tried to replace tenk1 in regression tests
with a view based on JSON_TABLE, with the same content, and discovered
that for one sub-optimal query it's execution duration increased many-fold.
With the preparation script attached, I see the following durations
(for a build compiled by clang 18.1.3 with -O3):
explain (verbose, analyze)
select
  (select max((select i.unique2 from tenk1 i where i.unique1 = o.unique1)))
from tenk1 o;
-- original tenk1
 Execution Time: 4769.481 ms

explain (verbose, analyze)
select
  (select max((select i.unique2 from jsonb_rs_tenk1 i where i.unique1 = 
o.unique1)))
from jsonb_rs_tenk1 o;
-- Function Call: jsonb_to_recordset...
 Execution Time: 6841.767 ms

explain (verbose, analyze)
select
  (select max((select i.unique2 from jsontable_tenk1 i where i.unique1 = 
o.unique1)))
from jsontable_tenk1 o;
-- Table Function Call: JSON_TABLE...
 Execution Time: 288310.131 ms
(with 63% of time spent inside ExecEvalJsonExprPath())

Just for fun I've tested also XMLTABLE with the similar content:
explain (verbose, analyze)
select
  (select max((select i.unique2 from xml_tenk1 i where i.unique1 = o.unique1)))
from xml_tenk1 o;
-- Table Function Call: XMLTABLE...
 Execution Time: 1235066.636 ms

Maybe it's worth to add a note to the JSON_TABLE() documentation saying that
jsonb_to_recordset is (inherently?) more performant when processing arrays
of flat structures for users not to re-discover this fact...

Best regards,
Alexander

jsontable-perf-setup.sql
Description: application/sql


Re: subscription/026_stats test is intermittently slow?

2024-04-19 Thread Alexander Lakhin

Hello Michael and Robert,

20.04.2024 05:57, Michael Paquier wrote:

On Fri, Apr 19, 2024 at 01:57:41PM -0400, Robert Haas wrote:

It looks to me like in the first run it took 3 minutes for the
replay_lsn to catch up to the desired value, and in the second run,
two seconds. I think I have seen previous instances where something
similar happened, although in those cases I did not stop to record any
details. Have others seen this? Is there something we can/should do
about it?

FWIW, I've also seen delays as well with this test on a few occasions.
Thanks for mentioning it.


It reminds me of
https://www.postgresql.org/message-id/858a7622-2c81-1687-d1df-1322dfcb2e72%40gmail.com

At least, I could reproduce such a delay with the attached patch applied.

Best regards,
Alexanderdiff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 66070e9131..9368a3a897 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -227,6 +227,7 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker,
 
 		if (rc & WL_LATCH_SET)
 		{
+pg_usleep(30);
 			ResetLatch(MyLatch);
 			CHECK_FOR_INTERRUPTS();
 		}
diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index d1d68fad9a..a82f1c051c 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -37,6 +37,7 @@ sub create_sub_pub_w_errors
 		$db,
 		qq[
 	BEGIN;
+SELECT pg_sleep(0.5);
 	CREATE TABLE $table_name(a int primary key);
 	INSERT INTO $table_name VALUES (1);
 	COMMIT;


Re: tablecmds.c/MergeAttributes() cleanup

2024-04-19 Thread Alexander Lakhin

Hello,

30.01.2024 09:22, Ashutosh Bapat wrote:



Please look at the segmentation fault triggered by the following query since
4d969b2f8:
CREATE TABLE t1(a text COMPRESSION pglz);
CREATE TABLE t2(a text);
CREATE TABLE t3() INHERITS(t1, t2);
NOTICE:  merging multiple inherited definitions of column "a"
server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.

Core was generated by `postgres: law regression [local] CREATE TABLE
 '.
Program terminated with signal SIGSEGV, Segmentation fault.

(gdb) bt
#0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116
#1  0x5606fbcc9d52 in MergeAttributes (columns=0x0, 
supers=supers@entry=0x5606fe293d30, relpersistence=112 'p',
is_partition=false, supconstr=supconstr@entry=0x7fff4046d410, 
supnotnulls=supnotnulls@entry=0x7fff4046d418)
  at tablecmds.c:2811
#2  0x5606fbccd764 in DefineRelation (stmt=stmt@entry=0x5606fe26a130, 
relkind=relkind@entry=114 'r', ownerId=10,
ownerId@entry=0, typaddress=typaddress@entry=0x0,
  queryString=queryString@entry=0x5606fe2695c0 "CREATE TABLE t3() INHERITS(t1, 
t2);") at tablecmds.c:885

This bug existed even before the refactoring.Happens because strcmp()
is called on NULL input (t2's compression is NULL). I already have a
fix for this and will be posting it in [1].

[1] 
https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7%40eisentraut.org



Now that that fix is closed with RwF [1], shouldn't this crash issue be
added to Open Items for v17?
(I couldn't reproduce the crash on 4d969b2f8~1 nor on REL_16_STABLE.)

https://commitfest.postgresql.org/47/4813/

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-19 Thread Alexander Lakhin

18.04.2024 20:49, Alvaro Herrera wrote:

On 2024-Apr-18, Alexander Lakhin wrote:


I think the feature implementation should also provide tab completion
for SPLIT/MERGE.

I don't think that we should be imposing on feature authors or
committers the task of filling in tab-completion for whatever features
they contribute.  I mean, if they want to add that, cool; but if not,
somebody else can do that, too.  It's not a critical piece.


I agree, I just wanted to note the lack of the current implementation.
But now, thanks to Dagfinn, we have the tab completion too.

I have also a question regarding "ALTER TABLE ... SET ACCESS METHOD". The
current documentation says:
When applied to a partitioned table, there is no data to rewrite, but
partitions created afterwards will default to the given access method
unless overridden by a USING clause.

But MERGE/SPLIT behave differently (if one can assume that MERGE/SPLIT
create new partitions under the hood):
CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler;

CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i);
ALTER TABLE t SET ACCESS METHOD heap2;
CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1);
CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2);
\d t+
  Partitioned table "public.t"
...
Access method: heap2

  Table "public.tp_0"
...
Access method: heap2

  Table "public.tp_1"
...
Access method: heap2

ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0;
  Partitioned table "public.t"
...
Access method: heap2

  Table "public.tp_0"
...
Access method: heap

Shouldn't it be changed, what do you think?

Best regards,
Alexander




Re: cataloguing NOT NULL constraints

2024-04-18 Thread Alexander Lakhin

Hello Alvaro,

18.04.2024 16:39, Alvaro Herrera wrote:

I have pushed a fix which should hopefully fix this problem
(d9f686a72e).  Please give this a look.  Thanks for reporting the issue.


Please look at an assertion failure, introduced with d9f686a72:
CREATE TABLE t(a int, NOT NULL a NO INHERIT);
CREATE TABLE t2() INHERITS (t);

ALTER TABLE t ADD CONSTRAINT nna NOT NULL a;
TRAP: failed Assert("lockmode != NoLock || IsBootstrapProcessingMode() || CheckRelationLockedByMe(r, AccessShareLock, 
true)"), File: "relation.c", Line: 67, PID: 2980258


On d9f686a72~1 this script results in:
ERROR:  cannot change NO INHERIT status of inherited NOT NULL constraint "t_a_not_null" 
on relation "t"

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-18 Thread Alexander Lakhin

Hi Alexander,

18.04.2024 13:35, Alexander Korotkov wrote:


The revised patchset is attached.
1) I've split the fix for the CommandCounterIncrement() issue and the
fix for relation persistence issue into a separate patch.
2) I've validated that the lock on the new partition is held in
createPartitionTable() after ProcessUtility() as pointed out by
Robert.  So, no need to place the lock again.
3) Added fix for problematic error message as a separate patch [1].
4) Added rename "salemans" => "salesmen" for tests as a separate patch.

I think these fixes are reaching committable shape, but I'd like
someone to check it before I push.


I think the feature implementation should also provide tab completion for
SPLIT/MERGE.
(ALTER TABLE t S
fills in only SET now.)

Also, the following MERGE operation:
CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i);
CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1);
CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2);
ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0;

leaves a strange constraint:
\d+ t*
  Table "public.tp_0"
...
Not-null constraints:
    "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i"

Best regards,
Alexander




clang's sanitizer makes stringToNode() extremely slow

2024-04-18 Thread Alexander Lakhin

Hello hackers,

When using a server built with clang (15, 18) with sanitizers enabled,
the last query in the following script:
SET parallel_setup_cost = 0;
SET min_parallel_table_scan_size = 0;

SELECT a::text INTO t FROM generate_series(1, 1000) a;
\timing on
SELECT string_agg(a, ',') FROM t WHERE a = REPEAT('0', 40);

runs for more than a minute on my workstation (a larger repeat count gives
much longer duration):
Time: 66017.594 ms (01:06.018)

The typical stack trace for a running parallel worker is:
#0  0x559a23671885 in __sanitizer::internal_strlen(char const*) ()
#1  0x559a236568ed in StrtolFixAndCheck(void*, char const*, char**, char*, 
int) ()
#2  0x559a236423dc in __interceptor_strtol ()
#3  0x559a240027d9 in atoi (...) at readfuncs.c:629
#5  0x559a23fb03f0 in _readConst () at readfuncs.c:275
#6  parseNodeString () at ./readfuncs.switch.c:29
#7  0x559a23faa421 in nodeRead (
    token=0x7fee75cf3bd2 "{CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false 
:constisnull false :location -1 :constvalue 44 [ 16 106 24 0 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48"...,

    tok_len=1) at read.c:338
#8  0x559a23faa916 in nodeRead (...) at read.c:452
#9  0x559a23fb3f34 in _readOpExpr () at ./readfuncs.funcs.c:279
#10 0x559a23fb0842 in parseNodeString () at ./readfuncs.switch.c:47
#11 0x559a23faa421 in nodeRead (...) at read.c:338
#12 0x559a23faa916 in nodeRead (...) at read.c:452
#13 0x559a23fefb74 in _readSeqScan () at ./readfuncs.funcs.c:3954
#14 0x559a23fb2b97 in parseNodeString () at ./readfuncs.switch.c:559
#15 0x559a23faa421 in nodeRead (...) at read.c:338
#16 0x559a23ffc033 in _readAgg () at ./readfuncs.funcs.c:4685
#17 0x559a23fb2dd3 in parseNodeString () at ./readfuncs.switch.c:611
#18 0x559a23faa421 in nodeRead (...) at read.c:338
#19 0x559a23feb340 in _readPlannedStmt () at ./readfuncs.funcs.c:3685
#20 0x559a23fb2ad1 in parseNodeString () at ./readfuncs.switch.c:541
#21 0x559a23faa421 in nodeRead (...) at read.c:338
#22 0x559a23fa99d8 in stringToNodeInternal (...) at read.c:92
#24 0x559a23d66609 in ExecParallelGetQueryDesc (...) at execParallel.c:1250
#25 ParallelQueryMain (...) at execParallel.c:1424
#26 0x559a238cfe13 in ParallelWorkerMain (...) at parallel.c:1516
#27 0x559a241e5b6a in BackgroundWorkerMain (...) at bgworker.c:848
#28 0x559a241ec254 in postmaster_child_launch (...) at launch_backend.c:265
#29 0x559a241f1c15 in do_start_bgworker (...) at postmaster.c:4270
#30 maybe_start_bgworkers () at postmaster.c:4501
#31 0x559a241f486e in process_pm_pmsignal () at postmaster.c:3774
#32 ServerLoop () at postmaster.c:1667
#33 0x559a241f0ed6 in PostmasterMain (...) at postmaster.c:1372
#34 0x559a23ebe16d in main (...) at main.c:197

The flamegraph (which shows that readDatum() -> __interceptor_strtol() ->
StrtolFixAndCheck() -> __sanitizer::internal_strlen () takes >99% of time)
is attached.
(I could not reproduce this behaviour with the gcc's sanitizer.)

Moreover, this query cannot be canceled, you can only kill -SIGKILL
parallel workers to interrupt it.

Best regards,
Alexander

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-15 Thread Alexander Lakhin

Hello Robert,

15.04.2024 17:30, Robert Haas wrote:

On Sat, Apr 13, 2024 at 6:05 AM Alexander Korotkov  wrote:

Please, find a my version of this fix attached.  I think we need to
check relpersistence in a similar way ATTACH PARTITION or CREATE TABLE
... PARTITION OF do.  I'm going to polish this a little bit more.

+ errmsg("\"%s\" is not an ordinary table",

This is not a phrasing that we use in any other error message. We
always just say "is not a table".


Initially I was confused by that message, because of:
CREATE TABLE t (i int) PARTITION BY RANGE (i);
CREATE FOREIGN TABLE ftp_0_1 PARTITION OF t
  FOR VALUES FROM (0) TO (1)
  SERVER loopback OPTIONS (table_name 'lt_0_1');
CREATE FOREIGN TABLE ftp_1_2 PARTITION OF t
  FOR VALUES FROM (1) TO (2)
  SERVER loopback OPTIONS (table_name 'lt_1_2');
ALTER TABLE t MERGE PARTITIONS (ftp_0_1, ftp_1_2) INTO ftp_0_2;
ERROR:  "ftp_0_1" is not a table
(Isn't a foreign table a table?)

And also:
CREATE TABLE t (i int) PARTITION BY RANGE (i);
CREATE TABLE tp_0_1 PARTITION OF t
  FOR VALUES FROM (0) TO (1);
CREATE TABLE t2 (i int) PARTITION BY RANGE (i);
ALTER TABLE t MERGE PARTITIONS (tp_0_1, t2) INTO tpn;
ERROR:  "t2" is not a table
(Isn't a partitioned table a table?)

And in fact, an ordinary table is not suitable for MERGE anyway:
CREATE TABLE t (i int) PARTITION BY RANGE (i);
CREATE TABLE tp_0_1 PARTITION OF t
  FOR VALUES FROM (0) TO (1);
CREATE TABLE t2 (i int);
ALTER TABLE t MERGE PARTITIONS (tp_0_1, t2) INTO tpn;
ERROR:  "t2" is not a partition

So I don't think that "an ordinary table" is a good (unambiguous) term
either.

Best regards,
Alexander




Re: Parallel CREATE INDEX for BRIN indexes

2024-04-15 Thread Alexander Lakhin

Hello Tomas,

14.04.2024 20:09, Tomas Vondra wrote:

I've pushed this, including backpatching the two fixes. I've reduced the
amount of data needed by the test, and made sure it works on 32-bits too
(I was a bit worried it might be sensitive to that, but that seems not
to be the case).


I've discovered that that test addition brings some instability to the test.
With the following pageinspect/Makefile modification:
-REGRESS = page btree brin gin gist hash checksum oldextversions
+REGRESS = page btree brin $(shell printf 'brin %.0s' `seq 99`) gin gist hash 
checksum oldextversions

echo "autovacuum_naptime = 1" > /tmp/temp.config
TEMP_CONFIG=/tmp/temp.config make -s check -C contrib/pageinspect
fails for me as below:
...
ok 17    - brin  127 ms
not ok 18    - brin  140 ms
ok 19    - brin  125 ms
...
# 4 of 107 tests failed.

The following change:
-CREATE TABLE brin_parallel_test (a int, b text, c bigint) WITH (fillfactor=40);
+CREATE TEMP TABLE brin_parallel_test (a int, b text, c bigint) WITH 
(fillfactor=40);
(similar to e2933a6e1) makes the test pass reliably for me.

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-12 Thread Alexander Lakhin

Hi Dmitry,

12.04.2024 16:04, Dmitry Koval wrote:

Hi!

Attached is a patch with corrections based on comments in previous letters (I 
think these corrections are not final).
I'll be very grateful for feedbacks and bug reports.

11.04.2024 20:00, Alexander Lakhin wrote:
> may be an attempt to merge into implicit
> pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does?

Corrected. Result is:


Thank you!
Still now we're able to create a partition in the pg_temp schema
explicitly. Please try:
ALTER TABLE t
MERGE PARTITIONS (tp_0_1, tp_1_2) INTO pg_temp.tp_0_2;

in the scenario [1] and you'll get the same empty table.

[1] 
https://www.postgresql.org/message-id/fdaa003e-919c-cbc9-4f0c-e4546e96bd65%40gmail.com

Best regards,
Alexander




Re: Issue with the PRNG used by Postgres

2024-04-12 Thread Alexander Lakhin

12.04.2024 08:05, Alexander Lakhin wrote:

2024-04-12 05:00:17.981 UTC [762336] PANIC:  stuck spinlock detected at 
WaitBufHdrUnlocked, bufmgr.c:5726



It looks like that spinlock issue caused by a race condition/deadlock.
What I see when the test fails is:
A client backend executing "DROP DATABASE conflict_db" performs
dropdb() -> DropDatabaseBuffers() -> InvalidateBuffer()
At the same time, bgwriter performs (for the same buffer):
BgBufferSync() -> SyncOneBuffer()

When InvalidateBuffer() is called, the buffer refcount is zero,
then bgwriter pins the buffer, thus increases refcount;
InvalidateBuffer() gets into the retry loop;
bgwriter calls UnpinBuffer() -> UnpinBufferNoOwner() ->
  WaitBufHdrUnlocked(), which waits for !BM_LOCKED state,
while InvalidateBuffer() waits for the buffer refcount decrease.

As it turns out, it's not related to spinlocks' specifics or PRNG, just a
serendipitous find.

Best regards,
Alexander




Re: Issue with the PRNG used by Postgres

2024-04-11 Thread Alexander Lakhin

Hi Andres,

12.04.2024 07:41, Andres Freund wrote:


FWIW, I just reproduced the scenario with signals. I added tracking of the
total time actually slept and lost to SpinDelayStatus, and added a function to
trigger a wait on a spinlock.

To wait less, I set max_standby_streaming_delay=0.1, but that's just for
easier testing in isolation. In reality that could have been reached before
the spinlock is even acquired.

On a standby, while a recovery conflict is happening:
PANIC:  XX000: stuck spinlock detected at crashme, path/to/file:line, after 
4.38s, lost 127.96s


So right now it's really not hard to trigger the stuck-spinlock logic
completely spuriously.  This doesn't just happen with hot standby, there are
plenty other sources of lots of signals being sent.


I managed to trigger that logic when trying to construct a reproducer
for bug #18426.

With the following delays added:
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1776,6 +1776,7 @@ retry:
  */
 if (BUF_STATE_GET_REFCOUNT(buf_state) != 0)
 {
+pg_usleep(30L);
 UnlockBufHdr(buf, buf_state);
 LWLockRelease(oldPartitionLock);
 /* safety check: should definitely not be our *own* pin */
@@ -5549,6 +5550,7 @@ TerminateBufferIO(BufferDesc *buf, bool clear_dirty, 
uint32 set_flag_bits,

 Assert(buf_state & BM_IO_IN_PROGRESS);

+pg_usleep(300);
 buf_state &= ~(BM_IO_IN_PROGRESS | BM_IO_ERROR);
 if (clear_dirty && !(buf_state & BM_JUST_DIRTIED))
 buf_state &= ~(BM_DIRTY | BM_CHECKPOINT_NEEDED);

and /tmp/temp.config:
bgwriter_delay = 10

TEMP_CONFIG=/tmp/temp.config make -s check -C src/test/recovery 
PROVE_TESTS="t/032*"
fails for me on iterations 22, 23, 37:
2024-04-12 05:00:17.981 UTC [762336] PANIC:  stuck spinlock detected at 
WaitBufHdrUnlocked, bufmgr.c:5726

I haven't investigated this case yet.

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-11 Thread Alexander Lakhin



11.04.2024 16:27, Dmitry Koval wrote:


Added correction (and test), see 
v3-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch.



Thank you for the correction, but may be an attempt to merge into implicit
pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does?

Please look also at another anomaly with schemas:
CREATE SCHEMA s1;
CREATE TABLE t (i int) PARTITION BY RANGE (i);
CREATE TABLE tp_0_2 PARTITION OF t
  FOR VALUES FROM (0) TO (2);
ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
  (PARTITION s1.tp0 FOR VALUES FROM (0) TO (1), PARTITION s1.tp1 FOR VALUES 
FROM (1) TO (2));
results in:
\d+ s1.*
Did not find any relation named "s1.*"
\d+ tp*
  Table "public.tp0"
...
   Table "public.tp1"
...

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-11 Thread Alexander Lakhin

Hi Dmitry,

11.04.2024 11:59, Dmitry Koval wrote:



FWIW, I also proposed a patch earlier that fixes error messages and
comments in the split partition code


Sorry, I thought all the fixes you suggested were already included in v1-0002-Fixes-for-english-text.patch (but they 
are not).

Added missing lines to v2-0002-Fixes-for-english-text.patch.



It seems to me that 
v2-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch
is not complete either.
Take a look, please:
CREATE TABLE t (i int) PARTITION BY RANGE (i);
SET search_path = pg_temp, public;
CREATE TABLE tp_0_1 PARTITION OF t
  FOR VALUES FROM (0) TO (1);
-- fails with:
ERROR:  cannot create a temporary relation as partition of permanent relation 
"t"

But:
CREATE TABLE t (i int) PARTITION BY RANGE (i);
CREATE TABLE tp_0_1 PARTITION OF t
  FOR VALUES FROM (0) TO (1);
CREATE TABLE tp_1_2 PARTITION OF t
  FOR VALUES FROM (1) TO (2);
INSERT INTO t VALUES(0), (1);
SELECT * FROM t;
-- the expected result is:
 i
---
 0
 1
(2 rows)

SET search_path = pg_temp, public;
ALTER TABLE t
MERGE PARTITIONS (tp_0_1, tp_1_2) INTO tp_0_2;
-- succeeds, and
\c -
SELECT * FROM t;
-- gives:
 i
---
(0 rows)

Please also ask your tech writers to check contents of src/test/sql/*, if
possible (perhaps, they'll fix "salesmans" and improve grammar).

Best regards,
Alexander

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-10 Thread Alexander Lakhin

10.04.2024 12:00, Alexander Lakhin wrote:

Hello Alexander and Dmitry,

10.04.2024 02:03, Alexander Korotkov wrote:

Thank you.  I've pushed this fix with minor corrections from me.




Please look at another anomaly with MERGE.

CREATE TEMP TABLE t (i int) PARTITION BY RANGE (i);
CREATE TABLE tp_0_2 PARTITION OF t
  FOR VALUES FROM (0) TO (2);
fails with
ERROR:  cannot create a permanent relation as partition of temporary relation 
"t"

But
CREATE TEMP TABLE t (i int) PARTITION BY RANGE (i);
CREATE TEMP TABLE tp_0_1 PARTITION OF t
  FOR VALUES FROM (0) TO (1);
CREATE TEMP TABLE tp_1_2 PARTITION OF t
  FOR VALUES FROM (1) TO (2);
ALTER TABLE t MERGE PARTITIONS (tp_0_1, tp_1_2) INTO tp_0_2;
succeeds and we get:
regression=# \d+ t*
    Partitioned table "pg_temp_1.t"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression | 
Stats target | Description
+-+---+--+-+-+-+--+-
 i  | integer |   |  | | plain | |  
    |
Partition key: RANGE (i)
Partitions: tp_0_2 FOR VALUES FROM (0) TO (2)

 Table "public.tp_0_2"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression | 
Stats target | Description
+-+---+--+-+-+-+--+-
 i  | integer |   |  | | plain | |  
    |
Partition of: t FOR VALUES FROM (0) TO (2)
Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 2))

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-10 Thread Alexander Lakhin

Hello Alexander and Dmitry,

10.04.2024 02:03, Alexander Korotkov wrote:

On Mon, Apr 8, 2024 at 11:43 PM Dmitry Koval  wrote:

Attached fix for the problems found by Alexander Lakhin.

About grammar errors.
Unfortunately, I don't know English well.
Therefore, I plan (in the coming days) to show the text to specialists
who perform technical translation of documentation.

Thank you.  I've pushed this fix with minor corrections from me.


Thank you for fixing that defect!

Please look at an error message emitted for foreign tables:
CREATE TABLE t (i int) PARTITION BY RANGE (i);
CREATE FOREIGN TABLE ftp_0_1 PARTITION OF t
  FOR VALUES FROM (0) TO (1)
  SERVER loopback OPTIONS (table_name 'lt_0_1');
CREATE FOREIGN TABLE ftp_1_2 PARTITION OF t
  FOR VALUES FROM (1) TO (2)
  SERVER loopback OPTIONS (table_name 'lt_1_2');
ALTER TABLE t MERGE PARTITIONS (ftp_0_1, ftp_1_2) INTO ftp_0_2;
ERROR:  "ftp_0_1" is not a table

Shouldn't it be more correct/precise?

Best regards,
Alexander




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-08 Thread Alexander Lakhin

08.04.2024 18:08, Tom Lane wrote:

Alexander Lakhin  writes:

Now that ExecQueryUsingCursor() is gone, it's not clear, what does
the following comment mean:?
     * We must turn off gexec_flag to avoid infinite recursion.  Note that
     * this allows ExecQueryUsingCursor to be applied to the individual query
     * results.

Hmm, the point about recursion is still valid isn't it?  I agree the
reference to ExecQueryUsingCursor is obsolete, but I think we need to
reconstruct what this comment is actually talking about.  It's
certainly pretty obscure ...


Sorry, I wasn't clear enough, I meant to remove only that reference, not
the quoted comment altogether.

Best regards,
Alexander




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-08 Thread Alexander Lakhin

Hello Daniel and Tom,

08.04.2024 17:25, Daniel Verite wrote:



So I whacked the patch around till I liked it better, and pushed it.

Thanks for taking care of this!


Now that ExecQueryUsingCursor() is gone, it's not clear, what does
the following comment mean:?
   * We must turn off gexec_flag to avoid infinite recursion.  Note that
   * this allows ExecQueryUsingCursor to be applied to the individual query
   * results.

Shouldn't it be removed?

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-08 Thread Alexander Lakhin

Hi Tender Wang,

08.04.2024 13:43, Tender Wang wrote:

Hi all,
  I went through the MERGE/SPLIT partition codes today, thanks for the works.  
I found some grammar errors:
 i. in error messages(Users can see this grammar errors, not friendly).
ii. in codes comments



On a quick glance, I saw also:
NULL-value
partitionde
splited
temparary

And a trailing whitespace at:
 the quarter partition back to monthly partitions:
warning: 1 line adds whitespace errors.

I'm also confused by "administrators" here:
https://www.postgresql.org/docs/devel/ddl-partitioning.html

(We can find on the same page, for instance:
... whereas table inheritance allows data to be divided in a manner of
the user's choosing.
It seems to me, that "users" should work for merging partitions as well.)

Though the documentation addition requires more than just a quick glance,
of course.

Best regards,
Alexander




Re: Speed up clean meson builds by ~25%

2024-04-08 Thread Alexander Lakhin

Hello Jelte,

08.04.2024 11:36, Jelte Fennema-Nio wrote:

On Mon, 8 Apr 2024 at 10:00, Alexander Lakhin  wrote:

As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
is fixed already.
Perhaps it's worth rechecking...

Using the attached script I got these timings. Clang is significantly
slower in all of them. But especially with -Og the difference between
is huge.

gcc 11.4.0: 7.276s
clang 18.1.3: 17.216s
gcc 11.4.0 --debug: 7.441s
clang 18.1.3 --debug: 18.164s
gcc 11.4.0 --debug -Og: 2.418s
clang 18.1.3 --debug -Og: 14.864s

I reported this same issue to the LLVM project here:
https://github.com/llvm/llvm-project/issues/87973


Maybe we're talking about different problems.
At [1] Thomas (and then I) was unhappy with more than 200 seconds
duration...

https://www.postgresql.org/message-id/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com

Best regards,
Alexander




Re: Speed up clean meson builds by ~25%

2024-04-08 Thread Alexander Lakhin

Hello Michael,

08.04.2024 08:23, Michael Paquier wrote:

On Fri, Apr 05, 2024 at 06:19:20PM +0200, Jelte Fennema-Nio wrote:

Agreed. While not a full solution, I think this patch is still good to
address some of the pain: Waiting 10 seconds at the end of my build
with only 1 of my 10 cores doing anything.

So while this doesn't decrease CPU time spent it does decrease
wall-clock time significantly in some cases, with afaict no downsides.

Well, this is also painful with ./configure.  So, even if we are going
to move away from it at this point, we still need to support it for a
couple of years.  It looks to me that letting the clang folks know
about the situation is the best way forward.



As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
is fixed already.
Perhaps it's worth rechecking...

[1] 
https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-07 Thread Alexander Lakhin

08.04.2024 01:15, Alexander Korotkov wrote:

Thank you for spotting this.  This seems like a missing check.  I'm
going to get a closer look at this tomorrow.



Thanks!

There is also an anomaly with the MERGE command:
CREATE TABLE t1 (i int, a int, b int, c int) PARTITION BY RANGE (a, b);
CREATE TABLE t1p1 PARTITION OF t1 FOR VALUES FROM (1, 1) TO (1, 2);

CREATE TABLE t2 (i int, t text) PARTITION BY RANGE (t);
CREATE TABLE t2pa PARTITION OF t2 FOR VALUES FROM ('A') TO ('C');

CREATE TABLE t3 (i int, t text);

ALTER TABLE t2 MERGE PARTITIONS (t1p1, t2pa, t3) INTO t2p;

leads to:
ERROR:  partition bound for relation "t3" is null
WARNING:  problem in alloc set PortalContext: detected write past chunk end in 
block 0x55f1ef42f820, chunk 0x55f1ef42ff40
WARNING:  problem in alloc set PortalContext: detected write past chunk end in 
block 0x55f1ef42f820, chunk 0x55f1ef42ff40

(I'm also not sure that the error message is clear enough (can't we say
"relation X is not a partition of relation Y" in this context, as in
MarkInheritDetached(), for example?).)

Whilst with
ALTER TABLE t2 MERGE PARTITIONS (t1p1, t2pa) INTO t2p;

I get:
Program terminated with signal SIGSEGV, Segmentation fault.

#0  pg_detoast_datum_packed (datum=0x1) at fmgr.c:1866
1866    if (VARATT_IS_COMPRESSED(datum) || VARATT_IS_EXTERNAL(datum))
(gdb) bt
#0  pg_detoast_datum_packed (datum=0x1) at fmgr.c:1866
#1  0x55d77d00fde2 in bttextcmp (...) at 
../../../../src/include/postgres.h:314
#2  0x55d77d03fa27 in FunctionCall2Coll (...) at fmgr.c:1161
#3  0x55d77ce1572f in partition_rbound_cmp (...) at partbounds.c:3525
#4  0x55d77ce157b9 in qsort_partition_rbound_cmp (...) at partbounds.c:3816
#5  0x55d77d0982ef in qsort_arg (...) at 
../../src/include/lib/sort_template.h:316
#6  0x55d77ce1d109 in calculate_partition_bound_for_merge (...) at 
partbounds.c:5786
#7  0x55d77cc24b2b in transformPartitionCmdForMerge (...) at 
parse_utilcmd.c:3524
#8  0x55d77cc2b555 in transformAlterTableStmt (...) at parse_utilcmd.c:3812
#9  0x55d77ccab17c in ATParseTransformCmd (...) at tablecmds.c:5650
#10 0x55d77ccafd09 in ATExecCmd (...) at tablecmds.c:5589
...

Best regards,
Alexander




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-07 Thread Alexander Lakhin

Hi Alexander and Dmitry,

07.04.2024 01:22, Alexander Korotkov wrote:

I've pushed 0001 and 0002.  I didn't push 0003 for the following reasons.


Please try the following (erroneous) query:
CREATE TABLE t1(i int, t text) PARTITION BY LIST (t);
CREATE TABLE t1pa PARTITION OF t1 FOR VALUES IN ('A');

CREATE TABLE t2 (i int, t text) PARTITION BY RANGE (t);
ALTER TABLE t2 SPLIT PARTITION t1pa INTO
  (PARTITION t2a FOR VALUES FROM ('A') TO ('B'),
   PARTITION t2b FOR VALUES FROM ('B') TO ('C'));

that triggers an assertion failure:
TRAP: failed Assert("datums != NIL"), File: "partbounds.c", Line: 3434, PID: 
1841459

or a segfault (in a non-assert build):
Program terminated with signal SIGSEGV, Segmentation fault.

#0  pg_detoast_datum_packed (datum=0x0) at fmgr.c:1866
1866    if (VARATT_IS_COMPRESSED(datum) || VARATT_IS_EXTERNAL(datum))
(gdb) bt
#0  pg_detoast_datum_packed (datum=0x0) at fmgr.c:1866
#1  0x55f38c5d5e3f in bttextcmp (...) at varlena.c:1834
#2  0x55f38c6030dd in FunctionCall2Coll (...) at fmgr.c:1161
#3  0x55f38c417c83 in partition_rbound_cmp (...) at partbounds.c:3525
#4  check_partition_bounds_for_split_range (...) at partbounds.c:5221
#5  check_partitions_for_split (...) at partbounds.c:5688
#6  0x55f38c256c49 in transformPartitionCmdForSplit (...) at 
parse_utilcmd.c:3451
#7  transformAlterTableStmt (...) at parse_utilcmd.c:3810
#8  0x55f38c2bdf9c in ATParseTransformCmd (...) at tablecmds.c:5650
...

Best regards,
Alexander




Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

2024-04-07 Thread Alexander Lakhin

Hello Peter,

03.04.2024 22:53, Peter Geoghegan wrote:

On Mon, Apr 1, 2024 at 6:33 PM Peter Geoghegan  wrote:

Note: v18 doesn't have any adjustments to the costing, as originally
planned. I'll probably need to post a revised patch with improved (or
at least polished) costing in the next few days, so that others will
have the opportunity to comment before I commit the patch.

Attached is v19, which dealt with remaining concerns I had about the
costing in selfuncs.c. My current plan is to commit this on Saturday
morning (US Eastern time).


Please look at an assertion failure (reproduced starting from 5bf748b86),
triggered by the following query:
CREATE TABLE t (a int, b int);
CREATE INDEX t_idx ON t (a, b);
INSERT INTO t (a, b) SELECT g, g FROM generate_series(0, 999) g;
ANALYZE t;
SELECT * FROM t WHERE a < ANY (ARRAY[1]) AND b < ANY (ARRAY[1]);

TRAP: failed Assert("so->numArrayKeys"), File: "nbtutils.c", Line: 560, PID: 
3251267

Best regards,
Alexander




Re: remaining sql/json patches

2024-04-05 Thread Alexander Lakhin

05.04.2024 10:09, Amit Langote wrote:

Seems like it might be a pre-existing issue, because I can also
reproduce the crash with:

SELECT * FROM COALESCE(row(1)) AS (a int, b int);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Backtrace:

#0  __pthread_kill_implementation (threadid=281472845250592,
signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x806c4334 in __pthread_kill_internal (signo=6,
threadid=) at pthread_kill.c:78
#2  0x8067c73c in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#3  0x80669034 in __GI_abort () at abort.c:79
#4  0x00ad9d4c in ExceptionalCondition (conditionName=0xcbb368
"!(tupdesc->natts >= colcount)", errorType=0xcbb278 "FailedAssertion",
fileName=0xcbb2c8 "nodeFunctionscan.c",
 lineNumber=379) at assert.c:54


That's strange, because I get the error (on master, 6f132ed69).
With backtrace_functions = 'tupledesc_match', I see
2024-04-05 10:48:27.827 MSK client backend[2898632] regress ERROR: function return row and query-specified return row do 
not match

2024-04-05 10:48:27.827 MSK client backend[2898632] regress DETAIL: Returned 
row contains 1 attribute, but query expects 2.
2024-04-05 10:48:27.827 MSK client backend[2898632] regress BACKTRACE:
tupledesc_match at execSRF.c:948:3
ExecMakeTableFunctionResult at execSRF.c:427:13
FunctionNext at nodeFunctionscan.c:94:5
ExecScanFetch at execScan.c:131:10
ExecScan at execScan.c:180:10
ExecFunctionScan at nodeFunctionscan.c:272:1
ExecProcNodeFirst at execProcnode.c:465:1
ExecProcNode at executor.h:274:9
 (inlined by) ExecutePlan at execMain.c:1646:10
standard_ExecutorRun at execMain.c:363:3
ExecutorRun at execMain.c:305:1
PortalRunSelect at pquery.c:926:26
PortalRun at pquery.c:775:8
exec_simple_query at postgres.c:1282:3
PostgresMain at postgres.c:4684:27
BackendMain at backend_startup.c:57:2
pgarch_die at pgarch.c:847:1
BackendStartup at postmaster.c:3593:8
ServerLoop at postmaster.c:1674:6
main at main.c:184:3
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f37127f0e40]
2024-04-05 10:48:27.827 MSK client backend[2898632] regress STATEMENT:  SELECT 
* FROM COALESCE(row(1)) AS (a int, b int);

That's why I had attributed the failure to JSON_TABLE().

Though SELECT * FROM generate_series(1, 1), COALESCE(row(1)) AS (a int, b int);
really triggers the assert too.
Sorry for the noise...

Best regards,
Alexander




Re: remaining sql/json patches

2024-04-05 Thread Alexander Lakhin

Hello Amit,

04.04.2024 15:02, Amit Langote wrote:

Pushed after fixing these and a few other issues.  I didn't include
the testing function you proposed in your other email.  It sounds
useful for testing locally but will need some work before we can
include it in the tree.

I'll post the rebased 0002 tomorrow after addressing your comments.


Please look at an assertion failure:
TRAP: failed Assert("count <= tupdesc->natts"), File: "parse_relation.c", Line: 
3048, PID: 1325146

triggered by the following query:
SELECT * FROM JSON_TABLE('0', '$' COLUMNS (js int PATH '$')),
  COALESCE(row(1)) AS (a int, b int);

Without JSON_TABLE() I get:
ERROR:  function return row and query-specified return row do not match
DETAIL:  Returned row contains 1 attribute, but query expects 2.

Best regards,
Alexander




Re: Recent 027_streaming_regress.pl hangs

2024-04-04 Thread Alexander Lakhin

Hello Andres,

26.03.2024 10:59, Andres Freund wrote:

Late, will try to look more in the next few days.



AFAICS, last 027_streaming_regress.pl failures on calliphoridae,
culicidae, tamandua occurred before 2024-03-27:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-03-26%2004%3A07%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2024-03-22%2013%3A26%3A21
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua=2024-03-24%2007%3A44%3A27

So it looks like the issue resolved, but there is another apparently
performance-related issue: deadlock-parallel test failures.

A recent one:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=canebrake=2024-04-02%2022%3A20%3A22
test deadlock-parallel    ... FAILED   345099 ms

+isolationtester: canceling step d2a1 after 300 seconds
 step d2a1: <... completed>
-  sum
--
-1
-(1 row)
-
...

The server log shows:
2024-04-02 23:56:45.353 UTC [3583878][client backend][5/530:0] LOG: statement: 
SET force_parallel_mode = on;
...
              SELECT lock_share(3,x) FROM bigt LIMIT 1;
2024-04-02 23:56:45.364 UTC [3583876][client backend][3/2732:0] LOG:  execute isolationtester_waiting: SELECT 
pg_catalog.pg_isolation_test_session_is_blocked($1, '{3583877,3583878,3583879,3583880}')

2024-04-02 23:56:45.364 UTC [3583876][client backend][3/2732:0] DETAIL:  
parameters: $1 = '3583878'
...
2024-04-02 23:57:28.967 UTC [3583876][client backend][3/5097:0] LOG:  execute isolationtester_waiting: SELECT 
pg_catalog.pg_isolation_test_session_is_blocked($1, '{3583877,3583878,3583879,3583880}')

2024-04-02 23:57:28.967 UTC [3583876][client backend][3/5097:0] DETAIL:  
parameters: $1 = '3583877'
2024-04-02 23:57:29.016 UTC [3583877][client backend][4/530:0] LOG: statement: 
COMMIT;
2024-04-02 23:57:29.039 UTC [3583876][client backend][3/5098:0] LOG:  execute isolationtester_waiting: SELECT 
pg_catalog.pg_isolation_test_session_is_blocked($1, '{3583877,3583878,3583879,3583880}')

2024-04-02 23:57:29.039 UTC [3583876][client backend][3/5098:0] DETAIL:  
parameters: $1 = '3583879'
...
2024-04-03 00:02:29.096 UTC [3583876][client backend][3/9472:0] LOG:  execute isolationtester_waiting: SELECT 
pg_catalog.pg_isolation_test_session_is_blocked($1, '{3583877,3583878,3583879,3583880}')

2024-04-03 00:02:29.096 UTC [3583876][client backend][3/9472:0] DETAIL:  
parameters: $1 = '3583878'
2024-04-03 00:02:29.172 UTC [3905345][not initialized][:0] LOG: connection 
received: host=[local]
2024-04-03 00:02:29.240 UTC [3583878][client backend][5/530:0] ERROR:  
canceling statement due to user request

The last step duration is 00:02:29.096 - 23:57:29.039 ~ 300 seconds
(default max_step_wait for REL_15_STABLE- (for REL_16_STABLE+ the default
value was increased to 360 by c99c67fc4)).

The average deadlock-parallel duration for REL_15_STABLE on canebrake is
around 128 seconds (for 140 runs I analyzed), but we can find also:
test deadlock-parallel    ... ok   377895 ms
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=canebrake=2024-03-27%2001%3A06%3A24=isolation-check
test deadlock-parallel    ... ok   302549 ms
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=canebrake=2023-11-06%2012%3A47%3A01=isolation-check
test deadlock-parallel    ... ok   255045 ms
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=canebrake=2023-11-09%2010%3A02%3A59=isolation-check

The similar situation on phycodurus:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurusdt=2024-02-11%2021:05:41
test deadlock-parallel    ... FAILED   389381 ms

The average deadlock-parallel duration for REL_13_STABLE on phycodurus is
around 78 seconds (for 138 recent runs), but there were also:
test deadlock-parallel    ... ok   441736 ms
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=phycodurus=2024-03-04%2015%3A19%3A04=isolation-check
test deadlock-parallel    ... ok   187844 ms
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=phycodurus=2023-11-03%2016%3A13%3A46=isolation-check

And also pogona, REL_14_STABLE:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona=2024-02-20%2003%3A50%3A49
test deadlock-parallel    ... FAILED   425482 ms

(I could reach similar duration on a slowed-down VM, with JIT enabled as
on these animals.)

So, maybe these machines require larger PGISOLATIONTIMEOUT or there is
still some OS/environment issue there?

Best regards,
Alexander




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-04-03 Thread Alexander Lakhin

Hello Alvaro,

27.02.2024 20:33, Alvaro Herrera wrote:

Here's the complete set, with these two names using the singular.



I've managed to trigger an assert added by 53c2a97a9.
Please try the following script against a server compiled with
-DTEST_SUMMARIZE_SERIAL (initially I observed this failure without the
define, it just simplifies reproducing...):
# initdb & start ...

createdb test
echo "
SELECT pg_current_xact_id() AS tx
\gset

SELECT format('CREATE TABLE t%s(i int)', g)
  FROM generate_series(1, 1022 - :tx) g
\gexec

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT pg_current_xact_id();
SELECT pg_sleep(5);
" | psql test &

echo "
SELECT pg_sleep(1);
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT 1 INTO a;
COMMIT;

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT 2 INTO b;
" | psql test

It fails for me with the following stack trace:
TRAP: failed Assert("LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE)"), File: "slru.c", Line: 366, 
PID: 21711

ExceptionalCondition at assert.c:52:13
SimpleLruZeroPage at slru.c:369:11
SerialAdd at predicate.c:921:20
SummarizeOldestCommittedSxact at predicate.c:1521:2
GetSerializableTransactionSnapshotInt at predicate.c:1787:16
GetSerializableTransactionSnapshot at predicate.c:1691:1
GetTransactionSnapshot at snapmgr.c:264:21
exec_simple_query at postgres.c:1162:4
...

Best regards,
Alexander




Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-03-31 Thread Alexander Lakhin

Hello Alvaro,

28.03.2024 18:58, Alvaro Herrera wrote:

Grumble.  I don't like initialization at declare time, so far from the
code that depends on the value.  But the alternative would have been to
assign right where this blocks starts, an additional line.  I pushed it
like you had it.


I've stumbled upon a test failure caused by the test query added in that
commit:
--- .../src/test/regress/expected/create_am.out  2024-03-28 
12:14:11.700764888 -0400
+++ .../src/test/recovery/tmp_check/results/create_am.out 2024-03-31 
03:10:28.172244122 -0400
@@ -549,7 +549,10 @@
 ERROR:  access method "btree" is not of type TABLE
 -- Other weird invalid cases that cause problems
 CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;
-ERROR:  "pg_am" is not partitioned
+ERROR:  deadlock detected
+DETAIL:  Process 3076180 waits for AccessShareLock on relation 1259 of 
database 16386; blocked by process 3076181.
+Process 3076181 waits for AccessShareLock on relation 2601 of database 16386; 
blocked by process 3076180.
+HINT:  See server log for query details.
 -- Drop table access method, which fails as objects depends on it
 DROP ACCESS METHOD heap2;
 ERROR:  cannot drop access method heap2 because other objects depend on it

027_stream_regress_primary.log contains:
2024-03-31 03:10:26.728 EDT [3076181] pg_regress/vacuum LOG: statement: VACUUM 
FULL pg_class;
...
2024-03-31 03:10:26.797 EDT [3076180] pg_regress/create_am LOG: statement: CREATE FOREIGN TABLE fp PARTITION OF pg_am 
DEFAULT SERVER x;

...
2024-03-31 03:10:28.183 EDT [3076181] pg_regress/vacuum LOG: statement: VACUUM 
FULL pg_database;

This simple demo confirms the issue:
for ((i=1;i<=20;i++)); do
echo "iteration $i"
echo "VACUUM FULL pg_class;" | psql >psql-1.log &
echo "CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;" | psql 
>psql-2.log &
wait
done

...
iteration 15
ERROR:  "pg_am" is not partitioned
iteration 16
ERROR:  deadlock detected
DETAIL:  Process 2556377 waits for AccessShareLock on relation 1259 of database 
16384; blocked by process 2556378.
Process 2556378 waits for AccessShareLock on relation 2601 of database 16384; 
blocked by process 2556377.
HINT:  See server log for query details.
...

Best regards,
Alexander




Re: To what extent should tests rely on VACUUM ANALYZE?

2024-03-29 Thread Alexander Lakhin

Hello Tom,

29.03.2024 16:51, Tom Lane wrote:

Alexander Lakhin  writes:

I think that deviation can be explained by the fact that cost_index() takes
baserel->allvisfrac (derived from pg_class.relallvisible) into account for
the index-only-scan case, and I see the following difference when a test
run fails:
      relname    | relpages | reltuples | relallvisible | indisvalid | 
autovacuum_count | autoanalyze_count
   
--+--+---+---++--+---
- tenk1    |  345 | 1 |   345 |    |
    0 | 0
+ tenk1    |  345 | 1 |   305 |    |
    0 | 0

Ouch.  So what's triggering that?  The intention of test_setup
surely is to provide a uniform starting point.


Thanks for your attention to the issue!
Please try the attached...

Best regards,
Alexanderdiff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index f0f8d4259c..d60c117378 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -5041,6 +5041,7 @@ ConditionalLockBufferForCleanup(Buffer buffer)
 
 	Assert(BufferIsValid(buffer));
 
+if (rand() % 10 == 0) return false;
 	if (BufferIsLocal(buffer))
 	{
 		refcount = LocalRefCount[-buffer - 1];
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 70ab47a92f..93c1f4f08d 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1932,6 +1932,32 @@ ORDER BY unique1;
   42
 (3 rows)
 
+explain (verbose)
+SELECT thousand, tenthous FROM tenk1
+WHERE thousand < 2 AND tenthous IN (1001,3000)
+ORDER BY thousand;
+  QUERY PLAN  
+--
+ Index Only Scan using tenk1_thous_tenthous on public.tenk1  (cost=0.29..4.71 rows=1 width=8)
+   Output: thousand, tenthous
+   Index Cond: (tenk1.thousand < 2)
+   Filter: (tenk1.tenthous = ANY ('{1001,3000}'::integer[]))
+(4 rows)
+
+select c.relname,c.relpages,c.reltuples,c.relallvisible,i.indisvalid,s.autovacuum_count,s.autoanalyze_count
+from pg_class c
+left join pg_stat_all_tables s on c.oid = s.relid
+left join pg_index i on c.oid = i.indexrelid
+where c.relname like 'tenk1%' order by c.relname;
+   relname| relpages | reltuples | relallvisible | indisvalid | autovacuum_count | autoanalyze_count 
+--+--+---+---++--+---
+ tenk1|  345 | 1 |   345 ||0 | 0
+ tenk1_hundred|   11 | 1 | 0 | t  |  |  
+ tenk1_thous_tenthous |   30 | 1 | 0 | t  |  |  
+ tenk1_unique1|   30 | 1 | 0 | t  |  |  
+ tenk1_unique2|   30 | 1 | 0 | t  |  |  
+(5 rows)
+
 explain (costs off)
 SELECT thousand, tenthous FROM tenk1
 WHERE thousand < 2 AND tenthous IN (1001,3000)
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index d49ce9f300..732b54fb35 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -765,6 +765,17 @@ SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)
 ORDER BY unique1;
 
+explain (verbose)
+SELECT thousand, tenthous FROM tenk1
+WHERE thousand < 2 AND tenthous IN (1001,3000)
+ORDER BY thousand;
+
+select c.relname,c.relpages,c.reltuples,c.relallvisible,i.indisvalid,s.autovacuum_count,s.autoanalyze_count
+from pg_class c
+left join pg_stat_all_tables s on c.oid = s.relid
+left join pg_index i on c.oid = i.indexrelid
+where c.relname like 'tenk1%' order by c.relname;
+
 explain (costs off)
 SELECT thousand, tenthous FROM tenk1
 WHERE thousand < 2 AND tenthous IN (1001,3000)


Re: To what extent should tests rely on VACUUM ANALYZE?

2024-03-29 Thread Alexander Lakhin

29.03.2024 11:59, Alexander Lakhin wrote:


But it looks like subselect is not the only test that can fail due to
vacuum instability. I see that create_index also suffers from cranky
ConditionalLockBufferForCleanup() (+if (rand() % 10 == 0)
return false; ), although it placed in parallel_schedule before
sanity_check, so this failure needs another explanation:
-  QUERY PLAN

- Index Only Scan using tenk1_thous_tenthous on tenk1
-   Index Cond: (thousand < 2)
-   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+  QUERY PLAN
+--
+ Sort
+   Sort Key: thousand
+   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+ Index Cond: ((thousand < 2) AND (tenthous = ANY 
('{1001,3000}'::integer[])))
+(4 rows)


I think that deviation can be explained by the fact that cost_index() takes
baserel->allvisfrac (derived from pg_class.relallvisible) into account for
the index-only-scan case, and I see the following difference when a test
run fails:
    relname    | relpages | reltuples | relallvisible | indisvalid | 
autovacuum_count | autoanalyze_count
 
--+--+---+---++--+---
- tenk1    |  345 | 1 |   345 |    |
    0 | 0
+ tenk1    |  345 | 1 |   305 |    |
    0 | 0

Best regards,
Alexander




Re: To what extent should tests rely on VACUUM ANALYZE?

2024-03-29 Thread Alexander Lakhin

29.03.2024 11:59, Alexander Lakhin wrote:


This simple change fixes the issue for me:
-VACUUM ANALYZE tenk2;
+VACUUM (ANALYZE, DISABLE_PAGE_SKIPPING) tenk2;



I'm sorry, I wasn't persevering enough when testing that...
After more test runs, I see that in fact it doesn't help.

Best regards,
Alexander




Re: To what extent should tests rely on VACUUM ANALYZE?

2024-03-29 Thread Alexander Lakhin

28.03.2024 20:33, Tom Lane wrote:



But I'm a bit confused - how come the estimates do change at all? The
analyze simply fetches 30k rows, and tenk only has 10k of them. So we
should have *exact* numbers, and it should be exactly the same for all
the analyze runs. So how come it changes like this?

It's plausible that the VACUUM ANALYZE done by test_setup fails
ConditionalLockBufferForCleanup() sometimes because of concurrent
activity like checkpointer writes.  I'm not quite sure how we
get from that to the observed symptom though.  Maybe the
VACUUM needs DISABLE_PAGE_SKIPPING?


Yeah, the way from ConditionalLockBufferForCleanup() returning false to
reltuples < 1 is not one-step, as I thought initially. There is also
sanity_check doing VACUUM in between. So, effectively the troublesome
scenario is:
VACUUM ANALYZE tenk2; -- with cleanup lock not granted for some blocks
VACUUM tenk2;

In this scenario, lazy_scan_heap() -> vac_estimate_reltuples() called two
times.
First, with rel_pages: 384, vacrel->scanned_pages: 384,
vacrel->live_tuples: 1 and it results in
vacrel->new_live_tuples = 1,

And second, with rel_pages: 345, vacrel->scanned_pages: 80,
vacrel->live_tuples: 2315 (for instance), and we get
vacrel->new_live_tuples = 9996,

With unmodified ConditionalLockBufferForCleanup() the second call is
performed with rel_pages: 345, vacrel->scanned_pages: 1,
vacrel->live_tuples: 24 and it returns 1.

This simple change fixes the issue for me:
-VACUUM ANALYZE tenk2;
+VACUUM (ANALYZE, DISABLE_PAGE_SKIPPING) tenk2;

But it looks like subselect is not the only test that can fail due to
vacuum instability. I see that create_index also suffers from cranky
ConditionalLockBufferForCleanup() (+if (rand() % 10 == 0)
return false; ), although it placed in parallel_schedule before
sanity_check, so this failure needs another explanation:
-  QUERY PLAN

- Index Only Scan using tenk1_thous_tenthous on tenk1
-   Index Cond: (thousand < 2)
-   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+  QUERY PLAN
+--
+ Sort
+   Sort Key: thousand
+   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+ Index Cond: ((thousand < 2) AND (tenthous = ANY 
('{1001,3000}'::integer[])))
+(4 rows)

Best regards,
Alexander




To what extent should tests rely on VACUUM ANALYZE?

2024-03-28 Thread Alexander Lakhin

Hello hackers,

When running multiple 027_stream_regress.pl test instances in parallel
(and with aggressive autovacuum) on a rather slow machine, I encountered
test failures due to the subselect test instability just as the following
failures on buildfarm:
1) 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit=2024-03-27%2010%3A16%3A12

--- 
/home/bf/bf-build/grassquit/HEAD/pgsql/src/test/regress/expected/subselect.out 
2024-03-19 22:20:34.435867114 +
+++ /home/bf/bf-build/grassquit/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/subselect.out 
2024-03-27 10:28:38.185776605 +

@@ -2067,16 +2067,16 @@
    QUERY PLAN
 -
  Hash Join
-   Hash Cond: (c.odd = b.odd)
+   Hash Cond: (c.hundred = a.hundred)
    ->  Hash Join
- Hash Cond: (a.hundred = c.hundred)
- ->  Seq Scan on tenk1 a
+ Hash Cond: (b.odd = c.odd)
+ ->  Seq Scan on tenk2 b
  ->  Hash
    ->  HashAggregate
  Group Key: c.odd, c.hundred
  ->  Seq Scan on tenk2 c
    ->  Hash
- ->  Seq Scan on tenk2 b
+ ->  Seq Scan on tenk1 a
 (11 rows)

2) 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=2024-03-27%2009%3A49%3A38

(That query was added recently (by 9f1337639 from 2023-02-15) and the
failure evidentially depends on timing, so the number of the failures I
could find on buildfarm is moderate for now.)

With the subselect test modified as in attached, I could see what makes
the plan change:
- ->  Seq Scan on public.tenk2 c (cost=0.00..445.00 
rows=1 width=8)
+ ->  Seq Scan on public.tenk2 c (cost=0.00..444.95 
rows=9995 width=8)

  relname | relpages | reltuples | autovacuum_count | autoanalyze_count
 -+--+---+--+---
- tenk2   |  345 | 1 |    0 | 0
+ tenk2   |  345 |  9995 |    0 | 0

Using the trick Thomas proposed in [1] (see my modification attached), I
could reproduce the failure easily on my workstation with no specific
conditions:
2024-03-28 14:05:13.792 UTC client backend[2358012] pg_regress/test_setup LOG:  !!!ConditionalLockBufferForCleanup() 
returning false
2024-03-28 14:05:13.792 UTC client backend[2358012] pg_regress/test_setup CONTEXT:  while scanning block 29 of relation 
"public.tenk2"

2024-03-28 14:05:13.792 UTC client backend[2358012] pg_regress/test_setup 
STATEMENT:  VACUUM ANALYZE tenk2;
...
  relname | relpages | reltuples | autovacuum_count | autoanalyze_count
 -+--+---+--+---
- tenk2   |  345 | 1 |    0 | 0
+ tenk2   |  345 |  9996 |    0 | 0
 (1 row)

So it looks to me like a possible cause of the failure, and I wonder
whether checks for query plans should be immune to such changes or results
of VACUUM ANALYZE should be 100% stable?

[1] 
https://www.postgresql.org/message-id/CA%2BhUKGKYNHmL_DhmVRiidHv6YLAL8jViifwwn2ABY__Y3BCphg%40mail.gmail.com

Best regards,
Alexanderdiff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 29b11f11aa..6a7bb6b7a9 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -2079,6 +2079,32 @@ ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
  ->  Seq Scan on tenk2 b
 (11 rows)
 
+explain (verbose)
+SELECT * FROM tenk1 A INNER JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+   QUERY PLAN   
+
+ Hash Join  (cost=1087.50..13845.00 rows=100 width=488)
+   Output: a.unique1, a.unique2, a.two, a.four, a.ten, a.twenty, a.hundred, a.thousand, a.twothousand, a.fivethous, a.tenthous, a.odd, a.even, a.stringu1, a.stringu2, a.string4, b.unique1, b.unique2, b.two, b.four, b.ten, b.twenty, b.hundred, b.thousand, b.twothousand, b.fivethous, b.tenthous, b.odd, b.even, b.stringu1, b.stringu2, b.string4
+   Hash Cond: (c.odd = b.odd)
+   ->  Hash Join  (cost=517.50..2000.00 rows=1 width=248)
+ Output: a.unique1, a.unique2, a.two, a.four, a.ten, a.twenty, 

Re: Properly pathify the union planner

2024-03-26 Thread Alexander Lakhin

Hello David,

25.03.2024 04:43, David Rowley wrote:

I didn't see that as a reason not to push this patch as this occurs
both with and without this change, so I've now pushed this patch.


Please look at a new assertion failure, that is triggered by the following
query:
SELECT count(*) FROM (
  WITH q1(x) AS (SELECT 1)
  SELECT FROM q1 UNION SELECT FROM q1
) qu;

TRAP: failed Assert("lg != NULL"), File: "planner.c", Line: 7941, PID: 1133017

Best regards,
Alexander




Re: Parallel Aggregates for string_agg and array_agg

2024-03-26 Thread Alexander Lakhin

Hello David,

23.01.2023 07:37, David Rowley wrote:

I've now pushed this.



I've discovered that the test query:
-- Ensure parallel aggregation is actually being used.
explain (costs off) select * from v_pagg_test order by y;

added by 16fd03e95 fails sometimes. For instance:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=urutu=2024-03-19%2021%3A04%3A05

--- /home/bf/bf-build/urutu/HEAD/pgsql/src/test/regress/expected/aggregates.out 
2024-02-24 06:42:47.039073180 +
+++ 
/home/bf/bf-build/urutu/HEAD/pgsql.build/src/test/regress/results/aggregates.out
 2024-03-19 22:24:18.155876135 +
@@ -1993,14 +1993,16 @@
  Sort Key: pagg_test.y, (((unnest(regexp_split_to_array((string_agg((pagg_test.x)::text, ','::text)), 
','::text::integer)

  ->  Result
    ->  ProjectSet
- ->  Finalize HashAggregate
+ ->  Finalize GroupAggregate
...

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus=2024-02-28%2007%3A38%3A27

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=avocet=2024-02-08%2008%3A47%3A45

I suspect that these failures caused by autovacuum.

I could reproduce this plan change when running multiple tests in
parallel, and also with the attached test patch applied (several sleeps are
needed for autovacuum/relation_needs_vacanalyze() to get a non-zero
mod_since_analyze value from pgstat):
TEMP_CONFIG=/tmp/temp.config TESTS="test_setup create_index create_aggregate 
aggregates" make -s check-tests

where /tmp/temp.config is:
autovacuum_naptime = 1
log_autovacuum_min_duration = 0

log_min_messages = INFO
log_min_error_statement = log
log_statement = 'all'

With EXPLAIN (VERBOSE), I see a slight change of the Seq Scan cost/rows
estimate:
... ->  Parallel Seq Scan on public.pagg_test  (cost=0.00..48.99 rows=2599 
width=8)
vs
.. ->  Parallel Seq Scan on public.pagg_test  (cost=0.00..48.00 rows=2500 
width=8)
(after automatic analyze of table "regression.public.pagg_test")

Best regards,
Alexanderdiff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index d54a255e58..3de98c916d 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1931,6 +1931,12 @@ select string_agg(v, decode('ee', 'hex')) from bytea_test_table;
 drop table bytea_test_table;
 -- Test parallel string_agg and array_agg
 create table pagg_test (x int, y int);
+select pg_sleep(3);
+ pg_sleep 
+--
+ 
+(1 row)
+
 insert into pagg_test
 select (case x % 4 when 1 then null else x end), x % 10
 from generate_series(1,5000) x;
@@ -1967,6 +1973,12 @@ from (
 	) a1
 ) a2
 group by y;
+select pg_sleep(3);
+ pg_sleep 
+--
+ 
+(1 row)
+
 -- Ensure results are correct.
 select * from v_pagg_test order by y;
  y | tmin | tmax | tndistinct | bmin | bmax | bndistinct | amin | amax | andistinct | aamin | aamax | aandistinct 
@@ -1983,6 +1995,12 @@ select * from v_pagg_test order by y;
  9 |   19 | 4999 |250 | 1019 | 999  |250 |   19 | 4999 |250 |19 |  4999 | 250
 (10 rows)
 
+select pg_sleep(3);
+ pg_sleep 
+--
+ 
+(1 row)
+
 -- Ensure parallel aggregation is actually being used.
 explain (costs off) select * from v_pagg_test order by y;
   QUERY PLAN  
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 441e01d150..25dd90ec65 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -750,6 +750,7 @@ drop table bytea_test_table;
 
 -- Test parallel string_agg and array_agg
 create table pagg_test (x int, y int);
+select pg_sleep(3);
 insert into pagg_test
 select (case x % 4 when 1 then null else x end), x % 10
 from generate_series(1,5000) x;
@@ -789,9 +790,11 @@ from (
 ) a2
 group by y;
 
+select pg_sleep(3);
 -- Ensure results are correct.
 select * from v_pagg_test order by y;
 
+select pg_sleep(3);
 -- Ensure parallel aggregation is actually being used.
 explain (costs off) select * from v_pagg_test order by y;
 


Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-03-26 Thread Alexander Lakhin

Hello Alvaro,

21.03.2024 15:07, Alvaro Herrera wrote:

Given that Michaël is temporarily gone, I propose to push the attached
tomorrow.


Please look at a new anomaly introduced with 374c7a229.
Starting from that commit, the following erroneous query:
CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;

triggers an assertion failure:
TRAP: failed Assert("relation->rd_rel->relam == InvalidOid"), File: 
"relcache.c", Line: 1219, PID: 3706301

with the stack trace:
...
#4  0x7fe53ced67f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x55f28555951e in ExceptionalCondition (conditionName=conditionName@entry=0x55f285744788 
"relation->rd_rel->relam == InvalidOid", fileName=fileName@entry=0x55f285743f1c "relcache.c", 
lineNumber=lineNumber@entry=1219)

    at assert.c:66
#6  0x55f285550450 in RelationBuildDesc (targetRelId=targetRelId@entry=16385, insertIt=insertIt@entry=false) at 
relcache.c:1219
#7  0x55f285550769 in RelationClearRelation (relation=relation@entry=0x7fe5310dd178, rebuild=rebuild@entry=true) at 
relcache.c:2667

#8  0x55f285550c41 in RelationFlushRelation (relation=0x7fe5310dd178) at 
relcache.c:2850
#9  0x55f285550ca0 in RelationCacheInvalidateEntry (relationId=) at relcache.c:2921
#10 0x55f285542551 in LocalExecuteInvalidationMessage (msg=0x55f2861b3160) 
at inval.c:738
#11 0x55f28554159b in ProcessInvalidationMessages (group=0x55f2861b2e6c, func=func@entry=0x55f2855424a8 
) at inval.c:518

#12 0x55f285542740 in CommandEndInvalidationMessages () at inval.c:1180
#13 0x55f28509cbbd in AtCCI_LocalCache () at xact.c:1550
#14 0x55f28509e88e in CommandCounterIncrement () at xact.c:1116
#15 0x55f2851d0c8b in DefineRelation (stmt=stmt@entry=0x55f2861803b0, relkind=relkind@entry=102 'f', ownerId=10, 
ownerId@entry=0, typaddress=typaddress@entry=0x0,
    queryString=queryString@entry=0x55f28617f870 "CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;") at 
tablecmds.c:1008
#16 0x55f28540945d in ProcessUtilitySlow (pstate=pstate@entry=0x55f2861a9dc0, pstmt=pstmt@entry=0x55f286180510, 
queryString=queryString@entry=0x55f28617f870 "CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;",
    context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, 
dest=0x55f2861807d0, qc=0x7fff15b5d7c0) at utility.c:1203
#17 0x55f28540911f in standard_ProcessUtility (pstmt=0x55f286180510, queryString=0x55f28617f870 "CREATE FOREIGN 
TABLE fp PARTITION OF pg_am DEFAULT SERVER x;", readOnlyTree=, context=PROCESS_UTILITY_TOPLEVEL,

    params=0x0, queryEnv=0x0, dest=0x55f2861807d0, qc=0x7fff15b5d7c0) at 
utility.c:1067
...

On 374c7a229~1 it fails with
ERROR:  "pg_am" is not partitioned

Best regards,
Alexander




Re: Built-in CTYPE provider

2024-03-24 Thread Alexander Lakhin

Hello Jeff,

21.03.2024 03:13, Jeff Davis wrote:

On Tue, 2024-03-19 at 13:41 +0100, Peter Eisentraut wrote:

* v25-0002-Support-C.UTF-8-locale-in-the-new-builtin-collat.patch

Looks ok.

Committed.


Please look at a Valgrind-detected error caused by the following query
(starting from f69319f2f):
SELECT lower('Π' COLLATE pg_c_utf8);

==00:00:00:03.487 1429669== Invalid read of size 1
==00:00:00:03.487 1429669==    at 0x7C64A5: convert_case (unicode_case.c:107)
==00:00:00:03.487 1429669==    by 0x7C: unicode_strlower (unicode_case.c:70)
==00:00:00:03.487 1429669==    by 0x66B218: str_tolower (formatting.c:1698)
==00:00:00:03.488 1429669==    by 0x6D6C55: lower (oracle_compat.c:55)

Best regards,
Alexander




Re: Test 031_recovery_conflict.pl is not immune to autovacuum

2024-03-20 Thread Alexander Lakhin

Hello Melanie,

20.03.2024 16:15, Melanie Plageman wrote:

Seems like we could just add autovacuum_enabled=false to the table like this:
diff --git a/src/test/recovery/t/031_recovery_conflict.pl
b/src/test/recovery/t/031_recovery_conflict.pl
index d87efa823fd..65bc858c02d 100644
--- a/src/test/recovery/t/031_recovery_conflict.pl
+++ b/src/test/recovery/t/031_recovery_conflict.pl
@@ -59,7 +59,7 @@ my $table1 = "test_recovery_conflict_table1";
  my $table2 = "test_recovery_conflict_table2";
  $node_primary->safe_psql(
 $test_db, qq[
-CREATE TABLE ${table1}(a int, b int);
+CREATE TABLE ${table1}(a int, b int) WITH (autovacuum_enabled = false);
  INSERT INTO $table1 SELECT i % 3, 0 FROM generate_series(1,20) i;
  CREATE TABLE ${table2}(a int, b int);
  ]);


Thanks for paying attention to it!

With such modification applied I've got another failure (on iteration 2):
[13:27:39.034](2.317s) ok 14 - startup deadlock: lock acquisition is waiting
Waiting for replication conn standby's replay_lsn to pass 0/343E6D0 on primary
done
timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at t/031_recovery_conflict.pl 
line 318.

# Postmaster PID for node "primary" is 1523036
### Stopping node "primary" using mode immediate

031_recovery_conflict_standby.log really doesn't contain the expected
message. I can share log files from a successful and failed test runs, if
they can be helpful, or I'll investigate this case today/tomorrow.

Best regards,
Alexander




Test 031_recovery_conflict.pl is not immune to autovacuum

2024-03-20 Thread Alexander Lakhin

Hello hackers,

Among many recoveryCheck (more concretely, 027_stream_regress) failures
occurred on a number of buildfarm animals after switching to meson, which
can be explained by timeouts, I saw a different failure on adder:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-03-18%2023%3A43%3A00
[23:48:52.521](9.831s) ok 13 - startup deadlock: cursor holding conflicting 
pin, also waiting for lock, established
[23:55:13.749](381.228s) # poll_query_until timed out executing this query:
#
# SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted;
#
# expecting this output:
# waiting
# last actual query output:
#
# with stderr:
[23:55:13.763](0.013s) not ok 14 - startup deadlock: lock acquisition is waiting

and I suspect that it might be caused by autovacuum.

I've managed to reproduced it locally (running 10 tests in parallel on a
2-core VM with disk bandwidth limited to 80MB/sec I get failures on
iterations 10, 1, 3) and observed the following (with wal_debug = on):
031_recovery_conflict_standby.log:
2024-03-20 04:12:06.519 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: DECLARE test_recovery_conflict_cursor CURSOR 
FOR SELECT a FROM test_recovery_conflict_table1;
2024-03-20 04:12:06.520 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: FETCH FORWARD FROM 
test_recovery_conflict_cursor;

2024-03-20 04:12:06.520 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: 
SELECT * FROM test_recovery_conflict_table2;
...
2024-03-20 04:12:07.073 UTC|||65fa620d.111ec8|LOG:  REDO @ 0/3438360; LSN 0/3438460: prev 0/3438338; xid 0; len 9; 
blkref #0: rel 1663/16385/16392, blk 0 - Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, 
isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 
77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101]

2024-03-20 04:12:07.084 UTC|||65fa620d.111ec8|LOG:  recovery still waiting 
after 11.241 ms: recovery conflict on buffer pin
2024-03-20 04:12:07.084 UTC|||65fa620d.111ec8|CONTEXT:  WAL redo at 0/3438360 for Heap2/PRUNE: snapshotConflictHorizon: 
0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 
40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 
100, 101]; blkref #0: rel 1663/16385/16392, blk 0
2024-03-20 04:12:07.095 UTC|vagrant|test_db|65fa6214.111ede|ERROR: canceling statement due to conflict with recovery at 
character 15

2024-03-20 04:12:07.095 UTC|vagrant|test_db|65fa6214.111ede|DETAIL: User 
transaction caused buffer deadlock with recovery.
...
2024-03-20 04:12:08.093 UTC|vagrant|postgres|65fa6216.111f1a|LOG: statement: SELECT 'waiting' FROM pg_locks WHERE 
locktype = 'relation' AND NOT granted;


031_recovery_conflict_primary.log:
2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|DEBUG:  Autovacuum VacuumUpdateCosts(db=16385, rel=16392, dobalance=yes, 
cost_limit=200, cost_delay=2 active=yes failsafe=no)
2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|DEBUG:  Autovacuum VacuumUpdateCosts(db=16385, rel=16392, dobalance=yes, 
cost_limit=200, cost_delay=2 active=yes failsafe=no)
2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|LOG:  INSERT @ 0/3438460:  - Heap2/PRUNE: snapshotConflictHorizon: 0, 
nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 
101]
2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|CONTEXT:  while scanning block 0 of relation 
"public.test_recovery_conflict_table1"

...
2024-03-20 04:12:05.981 UTC|||65fa6215.111f02|LOG:  automatic vacuum of table 
"test_db.public.test_recovery_conflict_table1": index scans: 0


The corresponding fragment of 031_recovery_conflict.pl:
$res = $psql_standby->query_until(
    qr/^1$/m, qq[
    BEGIN;
    -- hold pin
    DECLARE $cursor1 CURSOR FOR SELECT a FROM $table1;
    FETCH FORWARD FROM $cursor1;
    -- wait for lock held by prepared transaction
    SELECT * FROM $table2;
    ]);
ok(1,
    "$sect: cursor holding conflicting pin, also 

Re: Recent 027_streaming_regress.pl hangs

2024-03-19 Thread Alexander Lakhin

14.03.2024 23:56, Tom Lane wrote:

Thomas Munro  writes:

On Fri, Mar 15, 2024 at 7:00 AM Alexander Lakhin  wrote:

Could it be that the timeout (360 sec?) is just not enough for the test
under the current (changed due to switch to meson) conditions?

But you're right that under meson the test takes a lot longer, I guess
due to increased concurrency:

What it seems to be is highly variable.  Looking at calliphoridae's
last half dozen successful runs, I see reported times for
027_stream_regress anywhere from 183 to 704 seconds.  I wonder what
else is happening on that machine.  Also, this is probably not
helping anything:

'extra_config' => {
   ...
   'fsync = on'

I would suggest turning that off and raising wait_timeout a good
deal, and then we'll see if calliphoridae gets any more stable.


I could reproduce similar failures with
PG_TEST_EXTRA=wal_consistency_checking
only, running 5 tests in parallel on a slowed-down VM, so that test
duration increased to ~1900 seconds, but perhaps that buildfarm machine
has a different bottleneck (I/O?) or it's concurrent workload is not
uniform as in my experiments.

Meanwhile, I've analyzed failed test logs from buildfarm and calculated
the percentage of WAL replayed before timeout.
For instance, one of the failures:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-03-18%2022%3A36%3A40
standby_1.log:
2024-03-18 22:38:22.743 UTC [2010896][walreceiver][:0] LOG:  started streaming 
WAL from primary at 0/300 on timeline 1
...
2024-03-18 22:50:02.439 UTC [2004203][checkpointer][:0] LOG: recovery restart 
point at 0/E00E030
2024-03-18 22:50:02.439 UTC [2004203][checkpointer][:0] DETAIL: Last completed transaction was at log time 2024-03-18 
22:41:26.647756+00.
2024-03-18 22:50:12.841 UTC [2010896][walreceiver][:0] FATAL:  could not receive data from WAL stream: server closed the 
connection unexpectedly


primary.log:
2024-03-18 22:38:23.754 UTC [2012240][client backend][3/3:0] LOG: statement: 
GRANT ALL ON SCHEMA public TO public;
# ^^^ One of the first records produced by `make check`
...
2024-03-18 22:41:26.647 UTC [2174047][client backend][10/752:0] LOG:  statement: ALTER VIEW my_property_secure SET 
(security_barrier=false);

# ^^^ A record near the last completed transaction on standby
...
2024-03-18 22:44:13.226 UTC [2305844][client backend][22/3784:0] LOG:  
statement: DROP TABLESPACE regress_tblspace_renamed;
# ^^^ One of the last records produced by `make check`

\set t0 '22:38:23.754' \set t1 '22:44:13.226' \set tf '22:41:26.647756'
select extract(epoch from (:'tf'::time - :'t0'::time)) / extract(epoch from 
(:'t1'::time - :'t0'::time));
~52%

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-03-18%2018%3A58%3A58
~48%

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-03-18%2016%3A41%3A13
~43%

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-03-18%2015%3A47%3A09
~36%

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-03-15%2011%3A24%3A38
~87%

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2024-03-17%2021%3A55%3A41
~36%

So it still looks like a performance-related issue to me. And yes,
fsync = off -> on greatly increases (~3x) the overall test duration in
that my environment.

Best regards,
Alexander




Re: Add the ability to limit the amount of memory that can be allocated to backends.

2024-03-15 Thread Alexander Lakhin

Hello Anton,

14.03.2024 23:36, Anton A. Melnikov wrote:

On 13.03.2024 10:41, Anton A. Melnikov wrote:


Here is a version updated for the current master.



During patch updating i mistakenly added double counting of deallocatated 
blocks.
That's why the tests in the patch tester failed.
Fixed it and squashed fix 0002 with 0001.
Here is fixed version.


Please try the following with the patches applied:
echo "shared_buffers = '1MB'
max_total_backend_memory = '10MB'" > /tmp/extra.config

CPPFLAGS="-Og" ./configure --enable-tap-tests --enable-debug --enable-cassert 
...
TEMP_CONFIG=/tmp/extra.config make check

It fails for me as follows:
...
# postmaster did not respond within 60 seconds, examine 
".../src/test/regress/log/postmaster.log" for the reason
...
src/test/regress/log/postmaster.log contains:
...
TRAP: failed Assert("ret != NULL"), File: "mcxt.c", Line: 1327, PID: 4109270
TRAP: failed Assert("ret != NULL"), File: "mcxt.c", Line: 1327, PID: 4109271
postgres: autovacuum launcher (ExceptionalCondition+0x69)[0x55ce441fcc6e]
postgres: autovacuum launcher (palloc0+0x0)[0x55ce4422eb67]
postgres: logical replication launcher 
(ExceptionalCondition+0x69)[0x55ce441fcc6e]
postgres: autovacuum launcher (InitDeadLockChecking+0xa6)[0x55ce4408a6f0]
postgres: logical replication launcher (palloc0+0x0)[0x55ce4422eb67]
postgres: logical replication launcher 
(InitDeadLockChecking+0x45)[0x55ce4408a68f]
postgres: autovacuum launcher (InitProcess+0x600)[0x55ce4409c6f2]
postgres: logical replication launcher (InitProcess+0x600)[0x55ce4409c6f2]
postgres: autovacuum launcher (+0x44b4e2)[0x55ce43ff24e2]
...
grep TRAP src/test/regress/log/postmaster.log | wc -l
445

Best regards,
Alexander




Re: Recent 027_streaming_regress.pl hangs

2024-03-14 Thread Alexander Lakhin

Hello Thomas and Michael,

14.03.2024 06:16, Thomas Munro wrote:


Yeah, I was wondering if its checkpoint delaying logic might have
got the checkpointer jammed or something like that, but I don't
currently see how.  Yeah, the replay of bulk newpages could be
relevant, but it's not exactly new technology.  One thing I wondered
about is whether the Perl "wait for catchup" thing, which generates
large volumes of useless log, could be somehow changed to actually
show the progress after some time.  Something like "I'm still waiting
for this replica to reach LSN X, but it has so far only reported LSN
Y, and here's a dump of the WAL around there"?


I have perhaps reproduced the issue here (at least I'm seeing something
similar), and going to investigate the issue in the coming days, but what
I'm confused with now is the duration of poll_query_until:
For the failure you referenced:
[15:55:54.740](418.725s) # poll_query_until timed out executing this query:

And a couple of others:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2024-03-08%2000%3A34%3A06
[00:45:57.747](376.159s) # poll_query_until timed out executing this query:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2024-03-04%2016%3A32%3A17
[16:45:24.870](407.970s) # poll_query_until timed out executing this query:

Could it be that the timeout (360 sec?) is just not enough for the test
under the current (changed due to switch to meson) conditions?

Best regards,
Alexander




Re: pg16: XX000: could not find pathkey item to sort

2024-03-13 Thread Alexander Lakhin

Hello David,

09.10.2023 07:13, David Rowley wrote:

On Mon, 9 Oct 2023 at 12:42, David Rowley  wrote:

Maybe it's worth checking the total planning time spent in a run of
the regression tests with and without the patch to see how much
overhead it adds to the "average case".

I've now pushed the patch that trims off the Pathkeys for the ORDER BY
/ DISTINCT aggregates.



I've stumbled upon the same error, but this time it apparently has another
cause. It can be produced (on REL_16_STABLE and master) as follows:
CREATE TABLE t (a int, b int) PARTITION BY RANGE (a);
CREATE TABLE td PARTITION OF t DEFAULT;
CREATE TABLE tp1 PARTITION OF t FOR VALUES FROM (1) TO (2);
SET enable_partitionwise_aggregate = on;
SET parallel_setup_cost = 0;
SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a;

ERROR:  could not find pathkey item to sort

`git bisect` for this anomaly blames the same commit 1349d2790.

Best regards,
Alexander




Re: Spurious pgstat_drop_replslot() call

2024-03-12 Thread Alexander Lakhin

Hello Bertrand and Michael,

12.03.2024 09:17, Bertrand Drouvot wrote:



May I
suggest the attached additions with LWLockHeldByMeInMode to make sure
that the stats are dropped and created while we hold
ReplicationSlotAllocationLock?

Yeah, makes fully sense and looks good to me.


Sorry for a bit off-topic, but I've remembered an anomaly with a similar
assert:
https://www.postgresql.org/message-id/17947-b9554521ad963c9c%40postgresql.org

Maybe you would find it worth considering while working in this area...
(I've just run that reproducer on b36fbd9f8 and confirmed that the
assertion failure is still here.)

Best regards,
Alexander




Re: DROP DATABASE is interruptible

2024-03-12 Thread Alexander Lakhin

Hi,

13.07.2023 23:52, Andres Freund wrote:


Backpatching indeed was no fun. Not having BackgroundPsql.pm was the worst
part. But also a lot of other conflicts in tests... Took me 5-6 hours or
so.
But I now finally pushed the fixes. Hope the buildfarm agrees with it...

Thanks for the review!


I've discovered that the test 037_invalid_database, introduced with
c66a7d75e, hangs when a server built with -DCLOBBER_CACHE_ALWAYS or with
debug_discard_caches = 1 set via TEMP_CONFIG:
echo "debug_discard_caches = 1" >/tmp/extra.config
TEMP_CONFIG=/tmp/extra.config make -s check -C src/test/recovery/ 
PROVE_TESTS="t/037*"
# +++ tap check in src/test/recovery +++
[09:05:48] t/037_invalid_database.pl .. 6/?

regress_log_037_invalid_database ends with:
[09:05:51.622](0.021s) # issuing query via background psql:
#   CREATE DATABASE regression_invalid_interrupt;
#   BEGIN;
#   LOCK pg_tablespace;
#   PREPARE TRANSACTION 'lock_tblspc';
[09:05:51.684](0.062s) ok 8 - blocked DROP DATABASE completion

I see two backends waiting:
law  2420132 2420108  0 09:05 ?    00:00:00 postgres: node: law 
postgres [local] DROP DATABASE waiting
law  2420135 2420108  0 09:05 ?    00:00:00 postgres: node: law 
postgres [local] startup waiting

and the latter's stack trace:
#0  0x7f65c8fd3f9a in epoll_wait (epfd=9, events=0x563c40e15478, maxevents=1, timeout=-1) at 
../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x563c3fa9a9fa in WaitEventSetWaitBlock (set=0x563c40e15410, cur_timeout=-1, occurred_events=0x7fff579dda80, 
nevents=1) at latch.c:1570
#2  0x563c3fa9a8e4 in WaitEventSetWait (set=0x563c40e15410, timeout=-1, occurred_events=0x7fff579dda80, nevents=1, 
wait_event_info=50331648) at latch.c:1516
#3  0x563c3fa99b14 in WaitLatch (latch=0x7f65c5e112e4, wakeEvents=33, timeout=0, wait_event_info=50331648) at 
latch.c:538
#4  0x563c3fac7dee in ProcSleep (locallock=0x563c40e41e80, lockMethodTable=0x563c4007cba0 ) at 
proc.c:1339

#5  0x563c3fab4160 in WaitOnLock (locallock=0x563c40e41e80, 
owner=0x563c40ea5af8) at lock.c:1816
#6  0x563c3fab2c80 in LockAcquireExtended (locktag=0x7fff579dde30, lockmode=1, sessionLock=false, dontWait=false, 
reportMemoryError=true, locallockp=0x7fff579dde28) at lock.c:1080

#7  0x563c3faaf86d in LockRelationOid (relid=1213, lockmode=1) at lmgr.c:116
#8  0x563c3f537aff in relation_open (relationId=1213, lockmode=1) at 
relation.c:55
#9  0x563c3f5efde9 in table_open (relationId=1213, lockmode=1) at table.c:44
#10 0x563c3fca2227 in CatalogCacheInitializeCache (cache=0x563c40e8fe80) at 
catcache.c:980
#11 0x563c3fca255e in InitCatCachePhase2 (cache=0x563c40e8fe80, 
touch_index=true) at catcache.c:1083
#12 0x563c3fcc0556 in InitCatalogCachePhase2 () at syscache.c:184
#13 0x563c3fcb7db3 in RelationCacheInitializePhase3 () at relcache.c:4317
#14 0x563c3fce2748 in InitPostgres (in_dbname=0x563c40e54000 "postgres", dboid=5, username=0x563c40e53fe8 "law", 
useroid=0, flags=1, out_dbname=0x0) at postinit.c:1177

#15 0x563c3fad90a7 in PostgresMain (dbname=0x563c40e54000 "postgres", 
username=0x563c40e53fe8 "law") at postgres.c:4229
#16 0x563c3f9f01e4 in BackendRun (port=0x563c40e45360) at postmaster.c:4475

It looks like no new backend can be started due to the pg_tablespace lock,
when a new relcache file is needed during the backend initialization.

Best regards,
Alexander




Re: postgres_fdw test timeouts

2024-03-08 Thread Alexander Lakhin

Hello Nathan,

08.03.2024 01:00, Nathan Bossart wrote:

On Sun, Dec 10, 2023 at 12:00:01PM +0300, Alexander Lakhin wrote:

So I would not say that it's a dominant failure for now, and given that
04a09ee lives in master only, maybe we can save two commits (Revert +
Revert of revert) while moving to a more persistent solution.

I just checked in on this one to see whether we needed to create an "open
item" for v17.  While I'm not seeing the failures anymore, the patch that
Alexander claimed should fix it [0] doesn't appear to have been committed,
either.  Perhaps this was fixed by something else...

[0] 
https://postgr.es/m/CA%2BhUKGL0bikWSC2XW-zUgFWNVEpD_gEWXndi2PE5tWqmApkpZQ%40mail.gmail.com


I have re-run the tests and found out that the issue was fixed by
d3c5f37dd. It changed the inner of the loop "while (PQisBusy(conn))",
formerly contained in pgfdw_get_result() as follows:
            /* Data available in socket? */
            if (wc & WL_SOCKET_READABLE)
            {
                if (!PQconsumeInput(conn))
                    pgfdw_report_error(ERROR, NULL, conn, false, query);
            }
->
    /* Consume whatever data is available from the socket */
    if (PQconsumeInput(conn) == 0)
    {
        /* trouble; expect PQgetResult() to return NULL */
        break;
    }

That is, the unconditional "if PQconsumeInput() ..." eliminates the test
timeout.

Best regards,
Alexander




Re: partitioning and identity column

2024-02-19 Thread Alexander Lakhin

20.02.2024 07:57, Ashutosh Bapat wrote:

Could you please name functions, which you suspect, for me to recheck them?
Perhaps we should consider fixing all of such functions, in light of
b0f7dd915 and d57b7cc33...

Looks like the second commit has fixed all other places I knew except
Identity related functions. So worth fixing identity related functions
too. I see
dropconstraint_internal() has two calls to check_stack_depth() back to
back. The second one is not needed?


Yeah, that's funny. It looks like such a double protection emerged
because Alvaro protected the function (in b0f7dd915), which was waiting for
adding check_stack_depth() in the other thread (resulted in d57b7cc33).

Thank you for spending time on this!

Best regards,
Alexander




Re: partitioning and identity column

2024-02-19 Thread Alexander Lakhin

Hello Ashutosh,

19.02.2024 15:17, Ashutosh Bapat wrote:



Functions ATExecAddIdentity() and ATExecDropIdentity() are recursive too,
so I think they can be exploited as well.

not just Identity related functions, but many other functions in
tablecmds.c have that problem as I mentioned earlier.



Could you please name functions, which you suspect, for me to recheck them?
Perhaps we should consider fixing all of such functions, in light of
b0f7dd915 and d57b7cc33...

Best regards,
Alexander




Re: Removing unneeded self joins

2024-02-18 Thread Alexander Lakhin

18.02.2024 19:18, Alexander Korotkov wrote:

Attached is a draft patch fixing this query. Could you, please, recheck?


Yes, this patch fixes the behavior for that query (I've also tried several
similar queries). Though I don't know the code well enough to judge the
code change.

Best regards,
Alexander





  1   2   3   4   >