On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > Hi, > > While doing some benchmarking, I've ran into a fairly strange issue with OOM > breaking LaunchParallelWorkers() after the restart. What I see happening is > this: > > 1) a query is executed, and at the end of LaunchParallelWorkers we get > > nworkers=8 nworkers_launched=8 > > 2) the query does a Hash Aggregate, but ends up eating much more memory due > to n_distinct underestimate (see [1] from 2015 for details), and gets killed > by OOM > > 3) the server restarts, the query is executed again, but this time we get in > LaunchParallelWorkers > > nworkers=8 nworkers_launched=0 > > There's nothing else running on the server, and there definitely should be > free parallel workers. > > 4) The query gets killed again, and on the next execution we get > > nworkers=8 nworkers_launched=8 > > again, although not always. I wonder whether the exact impact depends on OOM > killing the leader or worker, for example.
I don't know what's going on but I think I have seen this once or twice myself while hacking on test code that crashed. I wonder if the DSM_CREATE_NULL_IF_MAXSEGMENTS case could be being triggered because the DSM control is somehow confused? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers