Here's the sort of thing every person who writes a monitoring tool involving pg_stat_activity goes through:

1) Hurray! I know how to see what the database is doing now! Let me try counting all the connections so I can finally figure out what to set [max_connections | work_mem | other] to. 2) Wait, some of these can be "<IDLE>". That's not documented. I'll have to special case them because they don't really matter for my computation. 3) Seriously, there's another state for idle in a transaction? Just how many of these special values are there? [There's actually one more surprise after this]

The whole thing is enormously frustrating, and it's an advocacy problem--it contributes to people just starting to become serious about using PostgreSQL lowering their opinion of its suitability for their business. If this is what's included for activity monitoring, and it's this terrible, it suggest people must not have very high requirements for that.

And what you end up with to make it better is not just another few keystrokes. Here, as a common example I re-use a lot, is a decoder inspired by Munin's connection count monitoring graph:

SELECT
    waiting,
    CASE WHEN current_query='<IDLE>' THEN true ELSE false END AS idle,
CASE WHEN current_query='<IDLE> in transaction' THEN true ELSE false END AS idletransaction, CASE WHEN current_query='<insufficient privilege>' THEN false ELSE true END as visible, CASE WHEN NOT waiting AND current_query NOT IN ('<IDLE>', '<IDLE> in transaction', '<insufficient privilege>') THEN true ELSE false END AS active,
    procpid,current_query
FROM pg_stat_activity WHERE procpid != pg_backend_pid();

What percentage of people do you think get this right? Now, what does that number go to if these states were all obviously exposed booleans? As I'm concerned, this design is fundamentally flawed as currently delivered, so the concept of "breaking" it doesn't really make sense.

The fact that you can only figure all this decoding magic out through extensive trial and error, or reading the source code to [the database | another monitoring tool], is crazy. It's a much bigger problem than the fact that the pid column is misnamed, and way up on my list of things I'm just really tired of doing. Yes, we could just document all these mystery states to help, but they'd still be terrible.

This is a database; let's expose the data in a way that it's easy to slice yourself using a database query. And if we're going to fix that--which unfortunately will be breaking it relative to those already using the current format--I figure why not bundle the procpid fix into that while we're at it. It's even possible to argue that breaking that small thing will draw useful attention to the improvements in other parts of the view. Having your monitoring query break after a version upgrade is no fun. But if investigating why reveals new stuff you didn't notice in the release notes, the changes become more discoverable, albeit in a somewhat perverse way.

Putting on my stability hat instead of my "make it right" one, maybe this really makes sense to expose as a view with a whole new name. Make this new one pg_activity (there's no stats here anyway), keep the old one around as pg_stat_activity for a few releases until everyone has converted to the new one.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to