Hi,

Due to [1] I thought it'd be a good idea to write an isolation test for
testing postgres_fdw interruptability during connection establishment.

I was able to make that work - but unfortunately doing so requires preventing
a login from completing. The only way I could see to achieve that is to lock
one of the important tables. I ended up with
  step s2_hang_logins { LOCK pg_db_role_setting; }

However, I'm a bit worried that that might cause problems. It'll certainly
block progress in concurrent tests, given it's a shared relation. But locking
relevant non-shared relations causes more problems, because it'll e.g. prevent
querying pg_stat_activity.

Does anybody see another way to cause a login to hang as part of an isolation
test, in a controllable manner?  Or, if not, do you think we can get away with
locking pg_db_role_setting?


The other complexity is that isolationtester won't see the wait edge going
through postgres_fdw. My approach for that is to do that one wait in a DO
block loop, matching on application_name = 'isolation/interrupt/s1'.

I don't think we can teach isolationtester to understand such edges. I guess
we could teach it to wait for certain wait events though? But I'm not sure how
generally useful that is. IIRC Tom concluded in the past that it didn't get us
very far.


The test currently tests one termination case, because isolationtester will
just fail the next permutation if a connection is gone. I don't see an issue
fixing that?


I attached my current WIP patch for the test.


Note that the test will only work with the patches from [1] applied.


Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/20220925232237.p6uskba2dw6fnwj2%40awork3.anarazel.de
>From e188d6dc6c5bc0dea5ff33692b0f44a607f8e855 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Sat, 21 Jan 2023 13:27:40 -0800
Subject: [PATCH v4 6/6] wip: test that postgres_fdw can be interrupted

---
 contrib/postgres_fdw/Makefile               |  1 +
 contrib/postgres_fdw/expected/interrupt.out | 95 +++++++++++++++++++++
 contrib/postgres_fdw/meson.build            |  5 ++
 contrib/postgres_fdw/specs/interrupt.spec   | 68 +++++++++++++++
 4 files changed, 169 insertions(+)
 create mode 100644 contrib/postgres_fdw/expected/interrupt.out
 create mode 100644 contrib/postgres_fdw/specs/interrupt.spec

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index c1b0cad453f..eb141442dee 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -17,6 +17,7 @@ EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql
 
 REGRESS = postgres_fdw
+ISOLATION = interrupt
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/postgres_fdw/expected/interrupt.out b/contrib/postgres_fdw/expected/interrupt.out
new file mode 100644
index 00000000000..fe32e8ea27b
--- /dev/null
+++ b/contrib/postgres_fdw/expected/interrupt.out
@@ -0,0 +1,95 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2_begin s2_hang_logins s1_select_remote s3_cancel_s1 s2_commit
+step s2_begin: BEGIN;
+step s2_hang_logins: LOCK pg_db_role_setting;
+step s1_select_remote: SELECT count(*) FROM remote_t1; <waiting ...>
+step s3_cancel_s1: 
+  SELECT wait_for_s1();
+  SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND application_name = 'isolation/interrupt/s1';
+
+wait_for_s1
+-----------
+           
+(1 row)
+
+pg_cancel_backend
+-----------------
+t                
+(1 row)
+
+step s1_select_remote: <... completed>
+ERROR:  canceling statement due to user request
+step s2_commit: COMMIT;
+
+starting permutation: s2_begin s2_hang_logins s1_select_remote s2_commit
+step s2_begin: BEGIN;
+step s2_hang_logins: LOCK pg_db_role_setting;
+step s1_select_remote: SELECT count(*) FROM remote_t1; <waiting ...>
+step s2_commit: COMMIT;
+step s1_select_remote: <... completed>
+count
+-----
+    0
+(1 row)
+
+
+starting permutation: s2_begin s2_hang_select s1_select_remote s3_cancel_s1 s2_commit
+step s2_begin: BEGIN;
+step s2_hang_select: LOCK local_t1;
+step s1_select_remote: SELECT count(*) FROM remote_t1; <waiting ...>
+step s3_cancel_s1: 
+  SELECT wait_for_s1();
+  SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND application_name = 'isolation/interrupt/s1';
+
+wait_for_s1
+-----------
+           
+(1 row)
+
+pg_cancel_backend
+-----------------
+t                
+(1 row)
+
+step s1_select_remote: <... completed>
+ERROR:  canceling statement due to user request
+step s2_commit: COMMIT;
+
+starting permutation: s2_begin s2_hang_select s1_select_remote s2_commit
+step s2_begin: BEGIN;
+step s2_hang_select: LOCK local_t1;
+step s1_select_remote: SELECT count(*) FROM remote_t1; <waiting ...>
+step s2_commit: COMMIT;
+step s1_select_remote: <... completed>
+count
+-----
+    0
+(1 row)
+
+
+starting permutation: s2_begin s2_hang_logins s1_select_remote s3_terminate_s1 s2_commit
+step s2_begin: BEGIN;
+step s2_hang_logins: LOCK pg_db_role_setting;
+step s1_select_remote: SELECT count(*) FROM remote_t1; <waiting ...>
+step s3_terminate_s1: 
+  SELECT wait_for_s1();
+  SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND application_name = 'isolation/interrupt/s1';
+
+wait_for_s1
+-----------
+           
+(1 row)
+
+pg_terminate_backend
+--------------------
+t                   
+(1 row)
+
+step s1_select_remote: <... completed>
+FATAL:  terminating connection due to administrator command
+server closed the connection unexpectedly
+	This probably means the server terminated abnormally
+	before or while processing the request.
+
+step s2_commit: COMMIT;
diff --git a/contrib/postgres_fdw/meson.build b/contrib/postgres_fdw/meson.build
index 2b451f165e1..4ae5e0e2f9b 100644
--- a/contrib/postgres_fdw/meson.build
+++ b/contrib/postgres_fdw/meson.build
@@ -39,4 +39,9 @@ tests += {
     ],
     'regress_args': ['--dlpath', meson.build_root() / 'src/test/regress'],
   },
+  'isolation': {
+    'specs': [
+      'interrupt',
+    ],
+  },
 }
diff --git a/contrib/postgres_fdw/specs/interrupt.spec b/contrib/postgres_fdw/specs/interrupt.spec
new file mode 100644
index 00000000000..5b4d77450d7
--- /dev/null
+++ b/contrib/postgres_fdw/specs/interrupt.spec
@@ -0,0 +1,68 @@
+setup {
+  CREATE EXTENSION postgres_fdw;
+
+  DO $d$
+      BEGIN
+          EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
+              OPTIONS (dbname '$$||current_database()||$$',
+                       port '$$||current_setting('port')||$$'
+              )$$;
+      END;
+  $d$;
+
+  CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
+
+  CREATE TABLE local_t1 (id int);
+  CREATE FOREIGN TABLE remote_t1 (id int) SERVER loopback OPTIONS (table_name 'local_t1');
+
+  -- wait until postgres_fdw is waiting, isolation tester can't see that wait edge
+  CREATE FUNCTION wait_for_s1() RETURNS void LANGUAGE plpgsql AS $f$
+    BEGIN
+      WHILE NOT EXISTS(
+      SELECT * FROM pg_stat_activity WHERE datname = current_database() AND application_name = 'isolation/interrupt/s1' AND wait_event = 'Extension')
+      LOOP
+      -- don't use too much CPU
+      PERFORM pg_sleep(0.05);
+      END LOOP;
+    END;
+  $f$
+}
+
+teardown {
+    DROP EXTENSION postgres_fdw CASCADE;
+    DROP TABLE local_t1;
+    DROP FUNCTION wait_for_s1();
+}
+
+session "s1"
+step s1_select_remote {SELECT count(*) FROM remote_t1;}
+
+session "s2"
+step s2_begin { BEGIN; }
+step s2_hang_logins { LOCK pg_db_role_setting; }
+step s2_hang_select { LOCK local_t1; }
+step s2_commit { COMMIT; }
+
+session "s3"
+step s3_cancel_s1 {
+  SELECT wait_for_s1();
+  SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND application_name = 'isolation/interrupt/s1';
+}
+
+step s3_terminate_s1 {
+  SELECT wait_for_s1();
+  SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND application_name = 'isolation/interrupt/s1';
+}
+
+# test that postgres_fdw can be interrupted during connection establishment
+permutation s2_begin s2_hang_logins s1_select_remote(*) s3_cancel_s1 s2_commit
+
+# for completeness: verify that committing allows to continue too
+permutation s2_begin s2_hang_logins s1_select_remote(*) s2_commit
+
+# test that statements can be interrupted
+permutation s2_begin s2_hang_select s1_select_remote(*) s3_cancel_s1 s2_commit
+permutation s2_begin s2_hang_select s1_select_remote(*) s2_commit
+
+# test that terminations work - needs to be last, as the connection is gone afterwards
+permutation s2_begin s2_hang_logins s1_select_remote(*) s3_terminate_s1 s2_commit
-- 
2.38.0

Reply via email to