Dear Paul, Mutaamba, Here are updated patches. Based on the Robert's suggestion, I separated into two parts. 0001 fixed the original issue and 0002 prohibited the slot manipulation in single-user mode. I want to focus on 0001 first because on one would argue it.
All comments from you were included in 0002.
> The patch does not apply. The attached v5 fixes it with a small update
> to the context for the slot.h hunk.
Oh, I didn't realize because cfbot said OK. Anyway, new patch could be applied
atop HEAD.
> The commit message needs some explanation. Why are we doing this? What
> does the patch do? What alternatives did we consider?
Updated. How do you feel?
> The patch has no docs. If we plan to forbid some functions in
> single-user mode, we should document which ones (e.g. in func.sgml).
A statement was added.
> There are no tests. Do we have any tests at all for single-user mode?
> The only one I see is in recovery/t/017_shm.pl. What if we had tests
> that ran the regress suite in single-user mode? Basically instead of
> psql we run postgres --single? Do we expect a lot of it would fail? Is
> it small enough the test could maintain a diff that expects those
> failures? I'm not saying we should do that as part of this patch, but
> is there any interest in that? Since single-user mode is most often
> used when things are broken, it's a harsh place to hit a bug.
>
> The patch lacks source comments. Something on
> CheckSlotIsInSingleUserMode explaining why would be good.
Comments were added in all caller functions.
> In ReplicationSlotRelease, why only assert that `slot->active_pid !=
> 0`? Why not assert that it's MyProcPid (even outside single-user
> mode)? Can one backend really release a slot while another is using
> it? Can you drop it? Maybe you can copy it?
You meant we should assert `slot->active_pid == MyProcPid`, right?
Naively considered it can be fix, but it is out-of-scope of the thread. It is
existing code, should be discussed verified in another place.
> Are we calling CheckSlotIsInSingleUserMode from everywhere that needs
> it? We tried to check all the functions that call
> ReplicationSlotCreate, ReplicationSlotRelease, and
> update_synced_slots_inactive_since (since they all have these
> asserts). To call out a few:
>
> The pg_replication_origin_* functions don't call the Assert-ing functions.
You asked that whether we should call CheckSlotIsInSingleUserMode in
pg_replication_origin_* APIs, right? It depends on the policy. If we want to
prohibit all replication-related command, it should be. It is still under
discussion, so I did not touch.
> binary_upgrade_logical_slot_has_caught_up - Not available from the command
> line.
I found that we can in both single-user and binary-upgrade mode. At that time
the function could be called:
```
$ postgres --single -D data/ postgres -b
PostgreSQL stand-alone backend 19devel
backend> SELECT binary_upgrade_logical_slot_has_caught_up('test');
1: binary_upgrade_logical_slot_has_caught_up (typeid = 16, len = 1,
typmod = -1, byval = t)
----
LOG: starting logical decoding for slot "test"
DETAIL: Streaming transactions committing after 0/0182C640, reading WAL from
0/0182C608.
STATEMENT: SELECT binary_upgrade_logical_slot_has_caught_up('test');
LOG: logical decoding found consistent point at 0/0182C608
DETAIL: There are no running transactions.
STATEMENT: SELECT binary_upgrade_logical_slot_has_caught_up('test');
1: binary_upgrade_logical_slot_has_caught_up = "f" (typeid = 16,
len = 1, typmod = -1, byval = t)
----
```
I don't think this is a realistic case, but the check was added.
> WalSndErrorCleanup - Probably not used in single-user mode?
IIUC we won't reach in the single-user mode. Hence it was not called
intentionally.
> We also see that PostgresMain calls ReplicationSlotRelease from its
> error handler. Presumably single-user mode would be executing
> PostgresSingleUserMain instead,
ISTM, PostgresMain() would be called from the PostgresSingleUserMain().
> but still perhaps we should call
> CheckSlotIsInSingleUserMode here just as a sanity-check?
I feel this code is to release the acquired slot when ERROR was raised. Since we
have already covered the entrance, it is not needed.
[1]:
https://www.postgresql.org/message-id/CA%2BTgmobspWhoMysNHL9b3C9xi%3DOpHdBYhtAgDH4N_A2foyjN-w%40mail.gmail.com
Best regards,
Hayato Kuroda
FUJITSU LIMITED
v5-0001-Set-ReplicationSlot-active_pid-even-in-single-use.patch
Description: v5-0001-Set-ReplicationSlot-active_pid-even-in-single-use.patch
v5-0002-Prohibit-slot-manipulation-while-in-single-user-m.patch
Description: v5-0002-Prohibit-slot-manipulation-while-in-single-user-m.patch
