On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> Hi,
>
> While doing some benchmarking, I've ran into a fairly strange issue with OOM
> breaking LaunchParallelWorkers() after the restart. What I see happening is
> this:
>
> 1) a query is executed, and at the end of LaunchParallelWorkers we get
>
>     nworkers=8 nworkers_launched=8
>
> 2) the query does a Hash Aggregate, but ends up eating much more memory due
> to n_distinct underestimate (see [1] from 2015 for details), and gets killed
> by OOM
>
> 3) the server restarts, the query is executed again, but this time we get in
> LaunchParallelWorkers
>
>     nworkers=8 nworkers_launched=0
>
> There's nothing else running on the server, and there definitely should be
> free parallel workers.
>
> 4) The query gets killed again, and on the next execution we get
>
>     nworkers=8 nworkers_launched=8
>
> again, although not always. I wonder whether the exact impact depends on OOM
> killing the leader or worker, for example.

I don't know what's going on but I think I have seen this once or
twice myself while hacking on test code that crashed.  I wonder if the
DSM_CREATE_NULL_IF_MAXSEGMENTS case could be being triggered because
the DSM control is somehow confused?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to