Hi,

On Fri, Sep 19, 2025 at 12:35 AM Matheus Alcantara
<[email protected]> wrote:
>
> On Mon Sep 15, 2025 at 2:40 PM -03, Masahiko Sawada wrote:
> > While the WAL-based approach discussed on another thread is promising,
> > I think it would not be acceptable for back branches as it requires
> > quite a lot of refactoring. Given that this is a long-standing bug in
> > listen/notify, I think we can continue discussing how to fix the issue
> > on backbranches on this thread.
> >
> Please see the new attached patch, it has a different implementation
> that I've previously posted which is based on the idea that Arseniy
> posted on [1].
>

Thank you for the new version.

> This new version include the "committed" field on AsyncQueueEntry struct
> so that we can use this info when processing the notification instead of
> call TransactionIdDidCommit()
>
> The "committed" field is set to true when the AsyncQueueEntry is being
> added on the SLRU page buffer when the PreCommit_Notify() is called. If
> an error occurs between the PreCommit_Notify() and AtCommit_Notify() the
> AtAbort_Notify() will be called and will set the "committed" field to
> false for the notifications inside the aborted transaction.
>
> It's a bit tricky to know at AtAbort_Notify() which notifications were
> added on the SLRU page buffer by the aborted transaction, so I created a
> new data structure and a global variable to keep track of this
> information. See the commit message for more information.
>

I like this approach. We got rid of dependency on clog and don't limit
vacuum. Several points about the fix:

Is it correct to remember and reuse slru slots here? IIUC we can't do
it if we don't hold SLRU bank lock, because by the time we get in
AtAbort_Notify() the queue page could be already evicted. Probably we
need to use SimpleLruReadPage after we acquire the lock in
AtAbort_Notify()?

I think adding a boolean 'committed' is a good approach, but what do
you think about setting the queue head back to the position where
aborted transaction notifications start? We can do such a reset in
AtAbort_Notify(). So instead of marking notifications as
'commited=false' we completely erase them from the queue by moving the
queue head back. From listeners perspective if there is a notification
of completed transaction in the queue - it's always a committed
transaction, so again get rid of TransactionIdDidCommit() call. It
seems like a simpler approach because we don't need to remember all
notifications positions in the queue and don't need the additional
field 'committed'. All we need is to remember the head position before
we write anything to the queue, and reset it back if there is an
abort. IIUC Listeners will never send such erased notifications:
- while the aborted transaction is looking like 'in progress',
listeners can't send its notifications.
- by the time the aborted transaction is completed, the head is
already set back so erased notifications are located after the queue
head and listeners can't read it.

> On the previously patch that I've posted I've created a TAP test to
> reproduce the issue with the VACUUM FREEZE, this new version also
> include this test and also a new test case that use the injection points
> extension to force an error between the PreCommit_Notify() and
> AtCommit_Notify() so that we can ensure that these notifications of an
> aborted transaction are not visible to other listener backends.
>

I think it's a good test to have. FWIW there is a way to reproduce the
test condition without the injection point. We can use the fact that
serializable conflicts are checked after tx adds notifications to the
queue. Please find the attached patch with the example tap test. Not
sure if using injections points is more preferable?


Best regards,
Arseniy Mukhin
From 08b98c34d418fd40f2351ad09d216317a72665a4 Mon Sep 17 00:00:00 2001
From: Arseniy Mukhin <[email protected]>
Date: Fri, 19 Sep 2025 15:24:27 +0300
Subject: [PATCH] Adds LISTEN/NOTIFY aborted_tx_notification TAP test

---
 .../modules/test_listen_notify/meson.build    |  2 +-
 .../t/002_aborted_tx_notifies.pl              | 67 +++++++++++++++++++
 2 files changed, 68 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/test_listen_notify/t/002_aborted_tx_notifies.pl

diff --git a/src/test/modules/test_listen_notify/meson.build b/src/test/modules/test_listen_notify/meson.build
index f0a2b5058e4..a68052cd353 100644
--- a/src/test/modules/test_listen_notify/meson.build
+++ b/src/test/modules/test_listen_notify/meson.build
@@ -7,7 +7,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_xid_freeze.pl',
-      't/002_transaction.pl',
+      't/002_aborted_tx_notifies.pl'
     ],
   },
 }
diff --git a/src/test/modules/test_listen_notify/t/002_aborted_tx_notifies.pl b/src/test/modules/test_listen_notify/t/002_aborted_tx_notifies.pl
new file mode 100644
index 00000000000..ed6abf9c576
--- /dev/null
+++ b/src/test/modules/test_listen_notify/t/002_aborted_tx_notifies.pl
@@ -0,0 +1,67 @@
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use File::Path qw(mkpath);
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+$node->init;
+$node->start;
+
+# Test checks that listeners do not receive notifications from aborted
+# transaction even if notifications have been added to the listen/notify
+# queue. To reproduce it we use the fact that serializable conflicts
+# are checked after tx adds notifications to the queue.
+
+# Setup
+$node->safe_psql('postgres', 'CREATE TABLE t1 (a bigserial);');
+
+# Listener
+my $psql_listener = $node->background_psql('postgres');
+$psql_listener->query_safe('LISTEN ch;');
+
+# Session1. Start SERIALIZABLE tx and add a notification.
+my $psql_session1 = $node->background_psql('postgres');
+$psql_session1->query_safe("
+	BEGIN ISOLATION LEVEL SERIALIZABLE;
+	SELECT * FROM t1;
+	INSERT INTO t1 DEFAULT VALUES;
+	NOTIFY ch,'committed';
+");
+
+# Session2. Start SERIALIZABLE tx, add a notification and introduce a conflict
+# with session1.
+my $psql_session2 = $node->background_psql('postgres', on_error_stop => 0);
+$psql_session2->query_safe("
+	BEGIN ISOLATION LEVEL SERIALIZABLE;
+	SELECT * FROM t1;
+	INSERT INTO t1 DEFAULT VALUES;
+	NOTIFY ch,'aborted';
+");
+
+# Session1 should be committed successfully. Listeners must receive session1
+# notifications.
+$psql_session1->query_safe("COMMIT;");
+
+# Session2 should be aborted due to the conflict with session1. Transaction
+# is aborted after adding notifications to the listen/notify queue, but
+# listeners should not receive session2 notifications.
+$psql_session2->query("COMMIT;");
+
+# send another notification after aborted
+$node->safe_psql('postgres', "NOTIFY ch, 'next_committed';");
+
+# fetch notifications
+my $res = $psql_listener->query_safe('begin; commit;');
+
+# check received notifications
+my @lines = split('\n', $res);
+is(@lines, 2, 'received all committed notifications');
+like($lines[0], qr/Asynchronous notification "ch" with payload "committed" received/);
+like($lines[1], qr/Asynchronous notification "ch" with payload "next_committed" received/);
+
+done_testing();
\ No newline at end of file
-- 
2.43.0

Reply via email to