On 08/25/2011 04:57 PM, Tomas Vondra wrote:
(b) sends bgwriter stats (so that the buffers_checkpoint is updated)

The idea behind only updating the stats in one chunk, at the end, is that it makes one specific thing easier to do. Let's say you're running a monitoring system that is grabbing snapshots of pg_stat_bgwriter periodically. If you want to figure out how much work a checkpoint did, you only need two points of data to compute that right now. Whenever you see either of the checkpoint count numbers increase, you just subtract off the previous sample; now you've got a delta for how many buffers that checkpoint wrote out. You can derive the information about the buffer counts involved that appears in the logs quite easily this way. The intent was to make that possible to do, so that people can figure this out without needing to parse the log data.

Spreading out the updates defeats that idea. It also makes it possible to see the buffer writes more in real-time, as they happen. You can make a case for both approaches having their use cases; the above is just summarizing the logic behind why it's done the way it is right now. I don't think many people are actually doing things with this to the level where their tool will care. The most popular consumer of pg_stat_bgwriter data I see is Munin graphing changes, and I don't think it will care either way.

Giving people the option of doing it the other way is a reasonable idea, but I'm not sure there's enough use case there to justify adding a GUC just for that. My next goal here is to eliminate checkpoint_segments, not to add yet another tunable extremely few users would ever touch.

As for throwing more log data out, I'm not sure what new analysis you're thinking of that it allows. I/O gets increasingly spiky as you zoom in on it; averaging over a shorter period can easily end up providing less insight about trends. If anything, I spend more time summarizing the data that's already there, rather than wanting to break them down. It's already providing way too much detail for most people. Customers tell me they don't care to see checkpoint stats unless they're across a day or more of sampling, so even the current "once every ~5 minutes" is way more info than they want. I have all this log parsing code and things that look at pg_stat_bgwriter to collect that data and produce higher level reports. And lots of it would break if any of this patch is added and people turn it on. I imagine other log/stat parsing programs might suffer issues too. That's your other hurdle for change here: the new analysis techniques have to be useful enough to justify that some downstream tool disruption is inevitable.

If you have an idea for how to use this extra data for something useful, let's talk about what that is and see if it's possible to build it in instead. This problem is harder than it looks, mainly because the way the OS caches writes here makes trying to derive hard numbers from what the background writer is doing impossible. When the database writes things out, and when they actually get written to disk, they are not the same event. The actual write is often during the sync phase, and not being able to tracking that beast is where I see the most problems at. The write phase, the easier part to instrument in the database, that is pretty boring. That's why the last extra logging I added here focused on adding visibility to the sync activity instead.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to