> On Feb 25, 2026, at 18:45, Jakub Wartak <[email protected]> wrote:
>
> On Wed, Feb 25, 2026 at 10:15 AM Chao Li <[email protected]> wrote:
>
> Hi Chao, thanks for review.
>
>> A few comments for v3:
>>
>> 1 - syncrep.c
> [..]
>> + proc_die_sender_pid == 0 ? 0
>> :
>> + errdetail("Signal
>> sent by PID %lld, UID %lld.",
>> + (long
>> long)proc_die_sender_pid, (long long)proc_die_sender_uid)
>> + ));
>> ```
>>
>> Here errdetail is used twice. I guess the second conditional one should be
>> errhint.
>>
>> 2 - syncrep.c
> [..]
>> Same as comment 1.
>
> You are right, apparently I copy/pasted code from src/backend/tcop/postgres.c
> way too fast... fixed.
>
>> 3
>> ```
>> +volatile uint32 proc_die_sender_pid = 0;
>> +volatile uint32 proc_die_sender_uid = 0;
>> ```
>>
>> These two globals are only written in the signal handler, I think they
>> should be sig_atomic_t to ensure atomic writes.
>
> Well the problem that sig_atomic_t is int and we need at least uint32 and
> I couldn't find better way. I think that 4 bytes writes will be mostly
> always atomic (for 64-bits it would depend on
> PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY)
>
> Yet I moved those little below, so it's more aligned to the other uses.
>
>> 4
>> ```
>> - errmsg("terminating walreceiver
>> process due to administrator command")));
>> + errmsg("terminating walreceiver
>> process due to administrator command"),
>> + proc_die_sender_pid == 0 ? 0 :
>> + errdetail("Signal sent by
>> PID %lld, UID %lld.",
>> + (long
>> long)proc_die_sender_pid, (long long)proc_die_sender_uid)
>> + ));
>> ```
>>
>> Why do we need to format pid and uid in “long long” format? I searched over
>> the source tree, the current postmaster.c just formats pid as int (%d):
>> ```
>> /* in parent, successful fork */
>> ereport(DEBUG2,
>> (errmsg_internal("forked new %s, pid=%d socket=%d",
>>
>> GetBackendTypeDesc(bn->bkend_type),
>> (int) pid, (int)
>> client_sock->sock)));
>> ```
>
> Yes, I think I was kind of lost when thinking about it (v1 had sig_atomic_t,
> later had pid_t, I read somewhere about 64-bit pids, and so on) vs
> platform-agnostic hell of putting that into printf). Possible I was
> overthinking it
> and I have reverted it to just using %d with that uint32. BTW I've also found:
> elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
>
> v4 attached. Thanks again for the review.
>
> -J.
> <v4-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patch>
I just reviewed v4 again and got a few more comments:
1. This patch only set the global proc_die_sender_pid/uid to 0 at startup, then
assign values to them upon receiving SIGTERM, and never reset them, which
assumes a process must die upon SIGTERM. Is the assumption true? I guess not.
If a process receives SIGTERM and not die immediately, then die for other
reason, then it may report a misleading PID and UID. So, I think we may need to
reset proc_die_sender_pid/uid somewhere. For example, in ProcessInterrupts(),
copy them to local variables and reset them to 0, then use the local variables
for ereport().
2.
```
@@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
ereport(WARNING,
(errmsg("canceling wait for synchronous
replication due to user request"),
- errdetail("The transaction has already
committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already
committed locally, but might not have been replicated to the standby."),
+ proc_die_sender_pid == 0 ? 0 :
+ errhint("Signal sent by
PID %d, UID %d.",
+
proc_die_sender_pid, proc_die_sender_uid)
+ ));
```
syncrpe.c uses errhint to print PID and UID, and postgres.c uses errdetail. We
should keep consistency, maybe all use errhint.
3.
```
@@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
QueryCancelPending = false;
ereport(WARNING,
(errmsg("canceling wait for synchronous
replication due to user request"),
- errdetail("The transaction has already
committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already
committed locally, but might not have been replicated to the standby."),
+ proc_die_sender_pid == 0 ? 0 :
+ errhint("Signal sent by
PID %d, UID %d.",
+
proc_die_sender_pid, proc_die_sender_uid)
+ ));
SyncRepCancelWait();
break;
}
```
I don’t think the query cancel case relates to SIGTERM, so we don’t need to log
PID and UID here.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/