Hi Michael, Thank you for your feedback!
On Thu, Mar 26, 2026 at 7:59 PM Michael Paquier <[email protected]> wrote: > On Thu, Mar 26, 2026 at 05:29:25PM -0700, SATYANARAYANA NARLAPURAM wrote: > > Postgre already has pg_stat_wal for aggregate WAL volume (bytes, > full-page > > images, buffers), and pg_walinspect (superuser access required) for > > post-hoc forensic analysis of individual WAL segments. But I don't see a > > lightweight, observability tool that answers in real time which record > > types are responsible for the WAL. Additionally, pg_walinspect runs > against > > on-disk WAL files, which is expensive. This view will be useful for > > monitoring systems to poll cheaply. > > > > *Use cases:* > > WAL volume investigation: see which record types dominate WAL generation > in > > real time without touching disk. > > Monitoring integration: Prometheus/Grafana can poll the view to track WAL > > composition over time and alert on anomalies. > > Replication tuning: identify whether WAL volume is dominated by data > > changes, index maintenance, FPIs, or vacuum activity to guide tuning. > > Extension debugging: custom WAL resource managers get visibility > > automatically. > > Why is it useful to have access to this information in live for > monitoring purposes, divided by RMGR? Per-RMGR breakdown is essentially a real-time X-ray of what the database is actually doing. Existing view help us understand what queries were submitted, but what physical operations resulted from them can be seen with this view. > What do you define as an > anomaly in this context and what can be changed on the server side to > get out of an anomaly, based on the definition you would give to it? > A few examples, HOT ratio dropped, BTree page splits, some app adding logical decoding messages, GIN generates more WAL than the corresponding heap modifications, high lock counts etc. > > The current WAL and IO stats are directly useful because they provide > numbers about flush, read and write quantity and timings. These are > useful metrics for benchmarking. > > This proposal only informs about the number of records, and we have a > various number of record types that have a variable length, that can > influence the distribution of the data written on disk. > yeah, that was a downside, didn't add sizes to keep the overhead less. As a whole, I am doubtful that this information is worth counting in > live, eating a stats kind ID. One could also implement a background > worker that provides more advanced aggregate stats outside the WAL > insert path, with a custom stats kind. No need to have that into > core, especially if the code that increments the stats eats more and > more cycles. Your argument makes sense to me, I was a bit hesitant on writing a background worker because of the potential side effects of tailing the WAL. Let me try a different approach by implementing an ondemand sniffing of the WAL, which can be implemented as an extension without changes to core. Do you have thoughts on making it a contrib module or modify existing pg_walinspect to accommodate these requirements? Thanks, Satya
