On Tue, 9 Aug 2022 at 06:19, Drouvot, Bertrand <bdrou...@amazon.com> wrote: > > > What do you think about adding a function in core PG to provide such > functionality? (means being able to retrieve all the stats (+ eventually > add some filtering) without the need to connect to each database).
I'm working on it myself too. I'll post a patch for discussion in a bit. I was more aiming at a C function that extensions could use directly rather than an SQL function -- though I suppose having the former it would be simple enough to implement the latter using it. (though it would have to be one for each stat type I guess) The reason I want a C function is I'm trying to get as far as I can without a connection to a database, without a transaction, without accessing the catalog, and as much as possible without taking locks. I think this is important for making monitoring highly reliable and low impact on production. It's also kind of fundamental to accessing stats for objects from other databases since we won't have easy access to the catalogs for the other databases. The main problem with my current code is that I'm accessing the shared memory hash table directly. This means the I'm possibly introducing locking contention on the shared memory hash table. I'm thinking of separating the shared memory hash scan from the metric scan so the list can be quickly built minimizing the time the lock is held. We could possibly also only rebuild that list at a lower frequency than the metrics gathering so new objects might not show up instantly. I have a few things I would like to suggest for future improvements to this infrastructure. I haven't polished the details of it yet but the main thing I think I'm missing is the catalog name for the object. I don't want to have to fetch it from the catalog and in any case I think it would generally be useful and might regularize the replication slot handling too. I also think it would be nice to have a change counter for every stat object, or perhaps a change time. Prometheus wouldn't be able to make use of it but other monitoring software might be able to receive only metrics that have changed since the last update which would really help on databases with large numbers of mostly static objects. Even on typical databases there are tons of builtin objects (especially functions) that are probably never getting updates. -- greg