I wrote: > Anyway it's only a guess. It could well be that that machine was simply > so heavily loaded that the stats collector couldn't respond fast enough. > I'm just wondering whether there's an unrecognized bug lurking here.
Still meditating on this ... and it strikes me that the pgstat.c code is really uncommunicative about problems. In particular, pgstat_read_statsfile_timestamp and pgstat_read_statsfile don't complain at all about being unable to read a stats file. It seems to me that the only "expected" case is ENOENT (and even that isn't really expected, in normal operation). Surely we should at least elog(LOG) any other failure condition? Another place that could probably do with elog(LOG) is where pgstat_write_statsfile resets last_statrequest in case it's in the future. That shouldn't ever happen. While the reset is probably a good thing for robustness, wouldn't logging it be a good idea? Lastly, backend_read_statsfile is designed to send an inquiry message every time through the loop, ie, every 10 msec. This is said to be in case the stats collector drops one. But is this enough to flood the collector and make things worse? I wonder if there should be some backoff there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers