Hello all,

We have encountered an issue with our PostgreSQL 15.5 installation. The
problem is that
PostgreSQL server periodically falls into a state when it accepts new
connections but
doesn't execute any queries. The session which runs a query hangs and
cannot be terminated
via SIGINT (even if it's just "SELECT 1") - a corresponding linux process
has state "S" and it's
not terminated even if the PostgreSQL master process is stopped. No matter
how we connect to
the database - both TCP and unix socket sessions hang but it seems that
existing sessions can
still execute queries (once we succeeded to connect using pgBouncer which
seemed to use an
existing connection to the database).

Here is a stack trace from gdb for one of the frozen sessions:

(gdb) bt 10
#0  0x00007f6d31dbd378 in poll () from /lib64/libc.so.6
#1  0x00007f6d3286aee1 in pqSocketCheck.part.2 () from
/usr/pgsql-15/lib/libpq.so.5
#2  0x00007f6d3286b054 in pqWaitTimed () from /usr/pgsql-15/lib/libpq.so.5
#3  0x00007f6d32867848 in PQgetResult () from /usr/pgsql-15/lib/libpq.so.5
#4  0x0000000000411320 in ExecQueryAndProcessResults
(query=query@entry=0x23a68b0
"select 1;", elapsed_msec=elapsed_msec@entry=0x7ffc2b5840a8,
svpt_gone_p=svpt_gone_p@entry=0x7ffc2b5840a7,
    is_watch=is_watch@entry=false, opt=opt@entry=0x0,
printQueryFout=printQueryFout@entry=0x0) at common.c:1426
#5  0x000000000040feb9 in SendQuery (query=0x23a68b0 "select 1;") at
common.c:1117
#6  0x000000000040627b in main (argc=<optimized out>, argv=<optimized out>)
at startup.c:384

We're using glibc-2.28-236.0.1.el8.7.x86_64 on this machine and PostgreSQL
15.5:

postgres=# select version();
                                                 version

---------------------------------------------------------------------------------------------------------
 PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
20210514 (Red Hat 8.5.0-20), 64-bit
(1 row)

We've tried to recreate a cluster from scratch using a logical dump on new
hardware but it didn't
help though. And unfortunately we could not reproduce the issue, it looks
like it occurs randomly and
when it happens only PostgreSQL restart helps. Also we have number of
machines that run the
same version of PostgreSQL but we have the problem only with one cluster so
maybe it somehow
related to queries that are specific to this cluster.

We also run patroni 2.1.4 on this cluster, for reference (I'm not sure if
it can be related). We checked PostgreSQL logs,
of course - there are no any messages that could be related to the issue as
well.

We will really appreciate any help, thanks!

-- 
With best regards, Andrei Zhidenkov.

Reply via email to