Thanks Tom Lane for detailing the issue.
>So ... how many is "a bunch"?
more than 85

>Looking at the code, it seems like it'd be possible for a sufficiently
>aggressive spawner of incoming connections to reach the
>MaxLivePostmasterChildren limit.  While the postmaster would correctly
>reject additional connection attempts after that, what it would not do
>is ensure that any child slots are left for new parallel worker processes.
>So we could hypothesize that the error you're seeing in the log is from
>failure to spawn a parallel worker process, due to being out of child
>slots.
Thanks Tom Lane for detailing the issue.

we have enabled "max_parallel_workers_per_gather = 4".  20 days before we
ran into this issue .


>However, given that max_connections = 500, MaxLivePostmasterChildren()
>would be 1000-plus.  This would mean that reaching this condition would
>require *at least* 500 concurrent connection-attempts-that-haven't-yet-
>been-rejected, maybe well more than that if you didn't have close to
>500 legitimately open sessions.  That seems like a lot, enough to suggest
>that you've got some pretty serious bug in your client-side logic.

below errors observed after crash in postgres logfile :

ERROR:  xlog flush request  is not satisfied for couple of tables , we have
initiated the vacuum full on those tables and the error went off after that.
ERROR:  right sibling's left-link doesn't match: block 273660 links to
273500 instead of expected 273661 in index -- observed this error while
doing vacuum freeze on databsase , we have dropped this index and created a
new one

Observations :

Vacuum freeze analyze job is getting stuck at database end which is
initiated thru cronjob, pg_cancel_backend(), pg_termiante_backend() is not
able to terminate those stuck  process , Restarting the database only able
to clear those process , i am thinking this is happening due to corruption
(if this is true how can i detect this ? pg_dump ?). is  there any way to
overcome this problem ?

does migrating the database to a new instance (pg_basebackup and switching
over to new instance ) solves this issue ?

Anyway, I think it's clearly a bug that canAcceptConnections() thinks the
number of acceptable connections is identical to the number of allowed
child processes; it needs to be less, by the number of background
processes we want to support.  But it seems like a darn hard-to-hit bug,
so I'm not quite sure that that explains your observation.

On Fri, 4 Oct 2019 at 03:49, Tom Lane <t...@sss.pgh.pa.us> wrote:

> bhargav kamineni <bhargavpostg...@gmail.com> writes:
> >> What was the database doing just before the FATAL line?
>
> > Postgres was rejecting a bunch of connections from a user who is having a
> > connection limit set. that was the the FATAL error that i could see in
> log
> > file.
> >  FATAL,53300,"too many connections for role ""user_app"""
>
> So ... how many is "a bunch"?
>
> Looking at the code, it seems like it'd be possible for a sufficiently
> aggressive spawner of incoming connections to reach the
> MaxLivePostmasterChildren limit.  While the postmaster would correctly
> reject additional connection attempts after that, what it would not do
> is ensure that any child slots are left for new parallel worker processes.
> So we could hypothesize that the error you're seeing in the log is from
> failure to spawn a parallel worker process, due to being out of child
> slots.
>
> However, given that max_connections = 500, MaxLivePostmasterChildren()
> would be 1000-plus.  This would mean that reaching this condition would
> require *at least* 500 concurrent connection-attempts-that-haven't-yet-
> been-rejected, maybe well more than that if you didn't have close to
> 500 legitimately open sessions.  That seems like a lot, enough to suggest
> that you've got some pretty serious bug in your client-side logic.
>
> Anyway, I think it's clearly a bug that canAcceptConnections() thinks the
> number of acceptable connections is identical to the number of allowed
> child processes; it needs to be less, by the number of background
> processes we want to support.  But it seems like a darn hard-to-hit bug,
> so I'm not quite sure that that explains your observation.
>
>                         regards, tom lane
>

Reply via email to