Hey Thomas,
after a few months, we started having this issue again.
So we revert the work_mem parameter to 600MB instead of 2GB.
But the issue is still there. A query went to segmentation fault, the DB
went to recovery mode and our app went to read only for a few minutes.

I understand we can increase max_connections so we can have many more
segments.

My question is : is there a way to understand the number of segments we
reached?
Currently we have 220 max_conn so as your formula is 64 + 2* 220 we have
about 500 shared segments.
We would like to increase that number to 300 or 400 but would be great to
understand if there is a way to make sure we will solve the issue as it
requires a restart of the service.

I know you were also talking about a redesign this part in PostgreSQL. Do
you know if anything has changed in any of the newer versions after 11.5?

Thanks a lot,
Nicola





Il giorno gio 12 set 2019 alle ore 01:01 Thomas Munro <
thomas.mu...@gmail.com> ha scritto:

> On Wed, Sep 11, 2019 at 11:20 PM Nicola Contu <nicola.co...@gmail.com>
> wrote:
> > If the error persist I will try to revert the work_mem.
> > Thanks a lot
>
> Hi Nicola,
>
> It's hard to say exactly what the cause of the problem is in your case
> and how to avoid it, without knowing what your query plans look like.
> PostgreSQL allows 64 + 2 * max_connections segments to exist a time,
> and it needs a number of them that depends on work_mem (in the case of
> Parallel Hash Join and Parallel Bitmap Index Scan), and also depends
> on the number of Gather nodes that appear in the plan, which in some
> unusual cases can result from partitioning.
>
> I've seen people reaching this error by running a lot of parallel
> queries concurrently.  If that's the cause, then you can definitely
> get some relief by turning work_mem down, or by turning
> max_connections up (even though you don't want to allow more
> connections -- because it influences the formula for deciding on the
> DSM segment limit).  We should probably adjust some of the internal
> constants to give us more slots, to avoid that problem, as discussed
> here:
>
>
> https://www.postgresql.org/message-id/flat/CA%2BhUKGL6H2BpGbiF7Lj6QiTjTGyTLW_vLR%3DSn2tEBeTcYXiMKw%40mail.gmail.com
>
> I've also seen people reaching this error by somehow coming up with
> plans that had a very large number of Gather nodes in them,
> corresponding to partitions; that's probably a bad plan (it'd
> presumably be better to terminate parallelism higher up in the plan,
> but these plans do seem to exist in the wild; I don't recall exactly
> why).  I think we need a bit of a redesign so that if there are
> multiple Gather nodes, they share the same main DSM segment, instead
> of blowing through this limit.
>
> --
> Thomas Munro
> https://enterprisedb.com
>

Reply via email to