Hi all,

We're seeing the FATAL error message from the subject pop up in our logs 
at regular intervals, but I haven't been able to pinpoint what is causing 
it. I'm hoping for some insights here.

We run a PostgreSQL 11.9 server on CentOS 7, within a vmware environment:
 PostgreSQL 11.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
20150623 (Red Hat 4.8.5-39), 64-bit

The package was installed from the PGDG repository.

I'm not even sure I should be worried, there doesn't appear to be any 
impact on the servers' functioning, but it does say 'FATAL'.
What we're seeing are lines like these two instances:

2020-09-30 21:45:09.999 CEST [2375] 172.30.2.25 asdf STATEMENT:  select 
count(*) from "util_asdf"."v_something_something2"
2020-09-30 21:45:15.018 CEST [2375] 172.30.2.25 asdf ERROR:  canceling 
statement due to user request
2020-09-30 21:45:15.018 CEST [2375] 172.30.2.25 asdf STATEMENT:  select 
count(*) from "dm_asdf"."asdf_func"
2020-09-30 21:45:20.027 CEST [2375] 172.30.2.25 asdf ERROR:  canceling 
statement due to user request
2020-09-30 21:45:20.027 CEST [2375] 172.30.2.25 asdf STATEMENT:  select 
count(*) from "dm_asdf"."asdf_func"
2020-09-30 22:27:51.422 CEST [20270] 172.30.2.26 selfservice_sales ERROR: 
canceling statement due to user request
2020-09-30 22:27:51.422 CEST [20270] 172.30.2.26 selfservice_sales 
STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"
2020-09-30 22:27:51.422 CEST [30649]   FATAL:  terminating connection due 
to administrator command
2020-09-30 22:27:51.422 CEST [30649]   STATEMENT:  select count(*) from 
"dm_b2b"."prlwytzkofskiv1"
2020-09-30 22:27:51.422 CEST [30648]   FATAL:  terminating connection due 
to administrator command
2020-09-30 22:27:51.422 CEST [30648]   STATEMENT:  select count(*) from 
"dm_b2b"."prlwytzkofskiv1"
2020-09-30 22:27:51.422 CEST [30647]   FATAL:  terminating connection due 
to administrator command
2020-09-30 22:27:51.422 CEST [30647]   STATEMENT:  select count(*) from 
"dm_b2b"."prlwytzkofskiv1"
2020-09-30 22:27:51.422 CEST [30646]   FATAL:  terminating connection due 
to administrator command
2020-09-30 22:27:51.422 CEST [30646]   STATEMENT:  select count(*) from 
"dm_b2b"."prlwytzkofskiv1"
2020-09-30 22:27:51.422 CEST [30645]   FATAL:  terminating connection due 
to administrator command
2020-09-30 22:27:51.422 CEST [30645]   STATEMENT:  select count(*) from 
"dm_b2b"."prlwytzkofskiv1"
2020-09-30 22:27:51.435 CEST [6482]   LOG:  background worker "parallel 
worker" (PID 30646) exited with exit code 1
2020-09-30 22:27:51.435 CEST [6482]   LOG:  background worker "parallel 
worker" (PID 30645) exited with exit code 1
2020-09-30 22:27:51.435 CEST [6482]   LOG:  background worker "parallel 
worker" (PID 30647) exited with exit code 1
2020-09-30 22:27:51.435 CEST [6482]   LOG:  background worker "parallel 
worker" (PID 30648) exited with exit code 1
2020-09-30 22:27:51.435 CEST [6482]   LOG:  background worker "parallel 
worker" (PID 30649) exited with exit code 1
2020-09-30 22:27:56.446 CEST [20270] 172.30.2.26 selfservice_sales ERROR: 
canceling statement due to user request
2020-09-30 22:27:56.446 CEST [20270] 172.30.2.26 selfservice_sales 
STATEMENT:  select count(*) from "dm_b2b"."prlwytzkofskiv1"
2020-09-30 22:27:56.446 CEST [30659]   FATAL:  terminating connection due 
to administrator command
2020-09-30 22:27:56.446 CEST [30659]   STATEMENT:  select count(*) from 
"dm_b2b"."prlwytzkofskiv1"
2020-09-30 22:27:56.446 CEST [30658]   FATAL:  terminating connection due 
to administrator command
2020-09-30 22:27:56.446 CEST [30658]   STATEMENT:  select count(*) from 
"dm_b2b"."prlwytzkofskiv1"
2020-09-30 22:27:56.446 CEST [30657]   FATAL:  terminating connection due 
to administrator command
2020-09-30 22:27:56.446 CEST [30657]   STATEMENT:  select count(*) from 
"dm_b2b"."prlwytzkofskiv1"
2020-09-30 22:27:56.446 CEST [30656]   FATAL:  terminating connection due 
to administrator command
2020-09-30 22:27:56.446 CEST [30656]   STATEMENT:  select count(*) from 
"dm_b2b"."prlwytzkofskiv1"
2020-09-30 22:27:56.446 CEST [30655]   FATAL:  terminating connection due 
to administrator command
2020-09-30 22:27:56.446 CEST [30655]   STATEMENT:  select count(*) from 
"dm_b2b"."prlwytzkofskiv1"
2020-09-30 22:27:56.459 CEST [6482]   LOG:  background worker "parallel 
worker" (PID 30655) exited with exit code 1
2020-09-30 22:27:56.459 CEST [6482]   LOG:  background worker "parallel 
worker" (PID 30656) exited with exit code 1
2020-09-30 22:27:56.459 CEST [6482]   LOG:  background worker "parallel 
worker" (PID 30657) exited with exit code 1
2020-09-30 22:27:56.459 CEST [6482]   LOG:  background worker "parallel 
worker" (PID 30658) exited with exit code 1
2020-09-30 22:27:56.459 CEST [6482]   LOG:  background worker "parallel 
worker" (PID 30659) exited with exit code 1
2020-09-30 22:43:08.459 CEST [8055] 172.30.2.25 selfservice_prd ERROR: 
schema "somethingelse" does not exist at character 71

Apparently, something is sending SIGTERM to our pg processes. I know that 
I'm not doing that, certainly not at those hours, and I'm the one who set 
up this system and am the only DBA of it.

Advice I found on the Internet is to use systemtap with some tap-script, 
but the scripts that I found just displayed the PID's of processes without 
telling me their names, which I didn't find all that useful in figuring 
out who was responsible, so I made an attempt (I have no experience with 
stap) at modifying it to print process names of signal sender and target:

/*
 * killsnoop-nd.stp     Trace process signals.
 *                      For Linux, uses SystemTap (non-debuginfo).
 *
 * Copyright (C) 2015 Brendan Gregg.
(etc)
 */
global target;
global signal;

probe begin
{
        printf("%-6s %-12s %-5s %-6s %6s\n", "FROM", "COMMAND", "SIG", 
"TO", "COMMAND");
}

probe nd_syscall.kill
{
        target[tid()] = uint_arg(1);
        signal[tid()] = uint_arg(2);
}

probe nd_syscall.kill.return
{
        if (signal[tid()] == 15 && target[tid()] != 0) {
                printf("%-6d %-12s %-5d %-6d %12s\n"
                                , pid(), execname()
                                , signal[tid()]
                                , target[tid()], 
pid2execname(target[tid()]));
        }
        delete target[tid()];
        delete signal[tid()];
}


The output of last night was:

FROM   COMMAND      SIG   TO     COMMAND
30068  systemd-udevd 15    14151  systemd-udevd
30068  systemd-udevd 15    14836  systemd-udevd
(...)
6482   postmaster   15    30649    postmaster
6482   postmaster   15    30648    postmaster
6482   postmaster   15    30647    postmaster
6482   postmaster   15    30646    postmaster
6482   postmaster   15    30645    postmaster
6482   postmaster   15    30659    postmaster
6482   postmaster   15    30658    postmaster
6482   postmaster   15    30657    postmaster
6482   postmaster   15    30656    postmaster
6482   postmaster   15    30655    postmaster
6482   postmaster   15    2065     postmaster
6482   postmaster   15    2064     postmaster
6482   postmaster   15    2063     postmaster

Several of these TO-pid's match those in the PG log.

$ ps aux | grep 6482
postgres  6482  0.0  0.5 51755776 1043872 ?    Ss   Sep28   3:25 
/usr/pgsql-11/bin/postmaster -D /data/11/data

Am I correct in concluding that postmaster is killing its own processes? 
If so, what is going on here? And more importantly, what do I do about it?

Regards,
Alban Hertroys.

P.S. I'm mailing from my work account, as this is work related. You may 
remember me from my private account.


Alban  Hertroys     
D: +31 (0)53 4 888 888  | T: +31 (0)53 4888 888 | E: 
alban.hertr...@apollovredestein.com
Apollo Vredestein B.V.| Ir. E.L.C. Schiffstraat 370, 7547 RD Enschede, The 
Netherlands
Chamber of Commerce number: 34223268




 
        
                 
The information contained in this e-mail is intended solely for the use of the 
individual or entity to whom it is addressed. If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, distribution 
or action in relation to the contents of this information is strictly 
prohibited and may be unlawful and request you to delete this message and any 
attachments and advise the sender by return e-mail. The confidentiality of this 
message is not warranted. Apollo Vredestein and its subsidiaries rule out any 
and every liability resulting from this or any other electronic transmission



        
                
           Please consider the environment before printing this e-mail

Reply via email to