On 2014-12-25 14:36:42 -0500, Tom Lane wrote: > I wonder whether when multiple processes are demanding statsfile updates, > there's some misbehavior that causes them to suck CPU away from the stats > collector and/or convince it that it doesn't need to write anything. > There are odd things in the logs in some of these events. For example in > today's failure on hamster, > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2014-12-25%2016%3A00%3A07 > there are two client-visible wait-timeout warnings, one each in the > gist and spgist tests. But in the postmaster log we find these in > fairly close succession: > > [549c38ba.724d:2] WARNING: pgstat wait timeout > [549c39b1.73e7:10] WARNING: pgstat wait timeout > [549c38ba.724d:3] WARNING: pgstat wait timeout > > Correlating these with other log entries shows that the first and third > are from the autovacuum launcher while the second is from the gist test > session. So the spgist failure failed to get logged, and in any case the > big picture is that we had four timeout warnings occurring in a pretty > short span of time, in a parallel test set that's not all that demanding > (12 parallel tests, well below our max). Not sure what to make of that.
My guess is that a checkpoint happened at that time. Maybe it'd be a good idea to make pg_regress start postgres with log_checkpoints enabled? My guess is that we'd find horrendous 'sync' times. Michael: Could you perhaps turn log_checkpoints on in the config? > BTW, I notice that in the current state of pgstat.c, all the logic for > keeping track of request arrival times is dead code, because nothing is > actually looking at DBWriteRequest.request_time. This makes me think that > somebody simplified away some logic we maybe should have kept. But if > we're going to leave it like this, we could replace the DBWriteRequest > data structure with a simple OID list and save a fair amount of code. That's indeed odd. Seems to have been lost when the statsfile was split into multiple files. Alvaro, Tomas? I wondered for a second whether the split could be responsible somehow, but there's reports of that in older backbranches as well: http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswine&dt=2014-12-23%2019%3A17%3A41 Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers