I wrote:
> Anyway it's only a guess.  It could well be that that machine was simply
> so heavily loaded that the stats collector couldn't respond fast enough.
> I'm just wondering whether there's an unrecognized bug lurking here.

Still meditating on this ... and it strikes me that the pgstat.c code
is really uncommunicative about problems.  In particular, 
pgstat_read_statsfile_timestamp and pgstat_read_statsfile don't complain
at all about being unable to read a stats file.  It seems to me that the
only "expected" case is ENOENT (and even that isn't really expected, in
normal operation).  Surely we should at least elog(LOG) any other
failure condition?

Another place that could probably do with elog(LOG) is where
pgstat_write_statsfile resets last_statrequest in case it's in the
future.  That shouldn't ever happen.  While the reset is probably
a good thing for robustness, wouldn't logging it be a good idea?

Lastly, backend_read_statsfile is designed to send an inquiry message
every time through the loop, ie, every 10 msec.  This is said to be in
case the stats collector drops one.  But is this enough to flood the
collector and make things worse?  I wonder if there should be some
backoff there.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to