Re: failure in 019_replslot_limit

Andres Freund Fri, 09 Feb 2024 10:59:36 -0800

Hi,

On 2024-02-09 18:00:01 +0300, Alexander Lakhin wrote:
> I've managed to reproduce this issue (which still persists:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-02-04%2001%3A53%3A44
> ) and saw that it's not checkpointer, but walsender is hanging:


How did you reproduce this?



> And I see the walsender process still running (I've increased the timeout
> to keep the test running and to connect to the process in question), with
> the following stack trace:
> #0  0x00007fe4feac3d16 in epoll_wait (epfd=5, events=0x55b279b70f38,
> maxevents=1, timeout=timeout@entry=-1) at
> ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #1  0x000055b278b9ab32 in WaitEventSetWaitBlock
> (set=set@entry=0x55b279b70eb8, cur_timeout=cur_timeout@entry=-1,
> occurred_events=occurred_events@entry=0x7ffda5ffac90,
> nevents=nevents@entry=1) at latch.c:1571
> #2  0x000055b278b9b6b6 in WaitEventSetWait (set=0x55b279b70eb8,
> timeout=timeout@entry=-1,
> occurred_events=occurred_events@entry=0x7ffda5ffac90,
> nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=100663297) at
> latch.c:1517
> #3  0x000055b278a3f11f in secure_write (port=0x55b279b65aa0,
> ptr=ptr@entry=0x55b279bfbd08, len=len@entry=21470) at be-secure.c:296
> #4  0x000055b278a460dc in internal_flush () at pqcomm.c:1356
> #5  0x000055b278a461d4 in internal_putbytes (s=s@entry=0x7ffda5ffad3c 
> "E\177", len=len@entry=1) at pqcomm.c:1302

So it's the issue that we wait effectively forever to to send a FATAL. I've
previously proposed that we should not block sending out fatal errors, given
that allows clients to do prevent graceful restarts and a lot of other things.

Greetings,

Andres Freund

Re: failure in 019_replslot_limit

Reply via email to