31.03.2026 13:54, Michael Paquier wrote:
On Tue, Mar 31, 2026 at 10:00:00AM +0300, Alexander Lakhin wrote:
So the backend is not completely stuck, but CommitTransactionCommand()
may take more than 5 seconds under some circumstances (maybe it's worth
investigating which exactly).
One could blame slow hardware, difficult to say, and I'm puzzled by
these periodic bumps that don't seem to happen elsewhere.
I managed to get the backtrace of such a sluggish backend:
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
0x0000003fb1f4cc26 in posix_fadvise64 () from /lib/riscv64-linux-gnu/libc.so.6
Id Target Id Frame
* 1 Thread 0x3fb2a4c620 (LWP 564194) "postgres" 0x0000003fb1f4cc26 in posix_fadvise64 () from
/lib/riscv64-linux-gnu/libc.so.6
#0 0x0000003fb1f4cc26 in posix_fadvise64 () from
/lib/riscv64-linux-gnu/libc.so.6
#1 0x0000002abef79444 in XLogFileClose () at xlog.c:3672
#2 0x0000002abef7cc66 in XLogWrite (WriteRqst=..., tli=tli@entry=1,
flexible=flexible@entry=false) at xlog.c:2356
#3 0x0000002abef7dbfc in XLogFlush (record=33561688) at xlog.c:2892
#4 0x0000002abef77976 in RecordTransactionCommit () at xact.c:1516
#5 CommitTransaction () at xact.c:2379
#6 0x0000002abef78938 in CommitTransactionCommandInternal () at xact.c:3224
#7 0x0000002abef78acc in CommitTransactionCommand () at xact.c:3185
#8 0x0000003fb2a3ed88 in initialize_worker_spi (table=0x2abf8bf358) at
worker_spi.c:132
#9 worker_spi_main (main_arg=<optimized out>) at worker_spi.c:181
....
(Three test runs produced the same stack trace.)
I think this can explain slow CommitTransactionCommand() and why it
happens not every time. Regarding other animals, I guess they can
experience the same bumps but not exceeding 5 seconds (50 tries). Thus,
from my understanding, for the failure to happen, we need to have slow
storage and initialize_worker_spi() -> CommitTransactionCommand() reaching
XLogFileClose().
Best regards,
Alexander