Hi,

On 12/22/2015 03:49 PM, Noah Misch wrote:
On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera wrote:
I have pushed it now.  Further testing, of course, is always welcome.

While investigating stats.sql buildfarm failures, mostly on animals axolotl
and shearwater, I found that this patch (commit 187492b) inadvertently removed
the collector's ability to coalesce inquiries.  Every PGSTAT_MTYPE_INQUIRY
received now causes one stats file write.  Before, pgstat_recv_inquiry() did:

        if (msg->inquiry_time > last_statrequest)
                last_statrequest = msg->inquiry_time;

and pgstat_write_statsfile() did:

        globalStats.stats_timestamp = GetCurrentTimestamp();
... (work of writing a stats file) ...
                last_statwrite = globalStats.stats_timestamp;
                last_statrequest = last_statwrite;

If the collector entered pgstat_write_statsfile() with more inquiries waiting
in its socket receive buffer, it would ignore them as being too old once it
finished the write and resumed message processing.  Commit 187492b converted
last_statrequest to a "last_statrequests" list that we wipe after each write.

So essentially we remove the list of requests, and thus on the next round we don't know the timestamp of the last request and write the file again unnecessarily. Do I get that right?

What if we instead kept the list but marked the requests as 'invalid' so that we know the timestamp? In that case we'd be able to do pretty much exactly what the original code did (but at per-db granularity).

We'd have to cleanup the list once in a while not to grow excessively large, but something like removing entries older than PGSTAT_STAT_INTERVAL should be enough.

Actually, I think that was the idea when I wrote the patch, but apparently I got distracted and it did not make it into the code.


I modeled a machine with slow stats writes using the attached diagnostic patch
(not for commit).  It has pgstat_write_statsfiles() sleep just before renaming
the temporary file, and it logs each stats message received.  A three second
delay causes stats.sql to fail the way it did on shearwater[1] and on
axolotl[2].  Inquiries accumulate until the socket receive buffer overflows,
at which point the socket drops stats messages whose effects we later test
for.  The 3s delay makes us drop some 85% of installcheck stats messages.
Logs from a single VACUUM show receipt of five inquiries ("stats 1:") with 3s
between them:

24239 2015-12-10 04:21:03.439 GMT LOG:  connection authorized: user=nmisch 
database=test
24236 2015-12-10 04:21:03.454 GMT LOG:  stats 2: 1888 + 936 = 2824
24236 2015-12-10 04:21:03.454 GMT LOG:  stats 2: 2824 + 376 = 3200
24236 2015-12-10 04:21:03.454 GMT LOG:  stats 2: 3200 + 824 = 4024
24239 2015-12-10 04:21:03.455 GMT LOG:  statement: vacuum pg_class
24236 2015-12-10 04:21:03.455 GMT LOG:  stats 1: 4024 + 32 = 4056
24236 2015-12-10 04:21:06.458 GMT LOG:  stats 12: 4056 + 88 = 4144
24236 2015-12-10 04:21:06.458 GMT LOG:  stats 1: 4144 + 32 = 4176
24239 2015-12-10 04:21:06.463 GMT LOG:  disconnection: session time: 
0:00:03.025 user=nmisch database=test host=[local]
24236 2015-12-10 04:21:09.486 GMT LOG:  stats 1: 4176 + 32 = 4208
24236 2015-12-10 04:21:12.503 GMT LOG:  stats 1: 4208 + 32 = 4240
24236 2015-12-10 04:21:15.519 GMT LOG:  stats 1: 4240 + 32 = 4272
24236 2015-12-10 04:21:18.535 GMT LOG:  stats 9: 4272 + 48 = 4320
24236 2015-12-10 04:21:18.535 GMT LOG:  stats 2: 4320 + 936 = 5256
24236 2015-12-10 04:21:18.535 GMT LOG:  stats 2: 5256 + 376 = 5632
24236 2015-12-10 04:21:18.535 GMT LOG:  stats 2: 5632 + 264 = 5896
24236 2015-12-10 04:21:18.535 GMT LOG:  stats 12: 5896 + 88 = 5984


As for how to fix this, the most direct translation of the old logic is to
retain last_statrequests entries that could help coalesce inquiries.I lean
toward that for an initial, back-patched fix.

That seems reasonable and I believe it's pretty much the idea I came up with above, right? Depending on how you define "entries that could help coalesce inquiries".

> It would be good, though, to
process two closely-spaced, different-database inquiries in one
pgstat_write_statsfiles() call.  We do one-database writes and all-databases
writes, but we never write "1 < N < all" databases despite the code prepared
to do so.  I tried calling pgstat_write_statsfiles() only when the socket
receive buffer empties.  That's dead simple to implement and aggressively
coalesces inquiries (even a 45s sleep did not break stats.sql), but it starves
inquirers if the socket receive buffer stays full persistently.  Ideally, I'd
want to process inquiries when the buffer empties _or_ when the oldest inquiry
is X seconds old.  I don't have a more-specific design in mind, though.

That's a nice idea, but I agree that binding the coalescing to buffer like this seems like a pretty bad idea exactly because of the starving. What might work though is if we could look at how much data is there in the buffer, process only those requests and then write the stats files.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to