On Thu, May 6, 2021 at 9:54 AM Craig Ringer <cr...@2ndquadrant.com> wrote: > > On Thu, 6 May 2021 at 02:28, Andres Freund <and...@anarazel.de> wrote: >> >> Hi, >> >> On 2021-05-05 18:33:27 +0800, Craig Ringer wrote: >> > I'm thinking of piggy-backing on the approach used in the "Get memory >> > contexts of an arbitrary backend process" patch in order to provide access >> > to detailed reorder buffer content statistics from walsenders on request. >> > >> > Right now the reorder buffer is mostly a black-box. I mostly rely on gdb or >> > on dynamic probes (perf, systemtap) to see what it's doing. I intend a >> > patch soon to add a couple of fields to struct WalSnd to report some very >> > coarse reorder buffer stats - at least oldest buffered xid, number of >> > buffered txns, total bytes of buffered txns in memory, total bytes of >> > buffered txns spilled to disk. >> > >> > But sometimes what I really want is details on the txns that're in the >> > reorder buffer, and that's not feasible to export via always-enabled >> > reporting like struct WalSnd. So I'm thinking that the same approach used >> > for the memory context stats patch might work well for asking the walsender >> > for a detailed dump of reorder buffer contents. Something like a >> > per-buffered-txn table of txn topxid, start-lsn, most recent change lsn, >> > number of changes, number of subxids, number of invalidations, number of >> > catalog changes, buffer size in memory, buffer size spilled to disk. >> > >> > Anyone drastically opposed to the idea? >> >> I am doubtful. The likelihood of ending with effectively unused code >> seems very substantial here. > > > I can't rule that out, but the motivation for this proposal isn't development > convenience. It's production support and operations. The reorder buffer is a > black box right now, and when you're trying to answer the questions "what is > the walsender doing," "is meaningful progress being made," and "what is > slowing down replication" it's ... not easy. > > I currently rely on some fairly hairy gdb scripts, which I'm not keen on > running on production systems if I can avoid it. > > I'm far from set on the approach of asking the walsender to dump a reorder > buffer state summary to a file. But I don't think the current state of > affairs is much fun for production use. Especially since we prevent the > pg_stat_replication sent_lsn from going backwards, so reorder buffering can > cause replication to appear to completely cease to progress for long periods > unless you identify the socket and monitor traffic on it, or you intrude on > the process with gdb. > > At the least it'd be helpful to have pg_stat_replication (or a new related > auxiliary view like pg_stat_logical_decoding) show > > - walsender total bytes sent this session > - number of txns processed this txn >
You might be able to derive some of the above sorts of stats from the newly added pg_stat_replication_slots [1]. > - number txns filtered out by output plugin this session > - oldest xid in reorder buffer > - reorder buffer number of txns > - reorder buffer total size (in-memory and total inc spilled) > - reorderbuffercommit current xid, last change lsn, total buffered size of > current tx, total bytes of buffer processed so far within the current txn, > and commit lsn if known, only when currently streaming a txn from > reorderbuffercommit > > That way it'd be possible to tell if a logical walsender is currently > processing a commit and get a much better sense of its progress within the > commit. > > Perhaps output plugins could do some of this and expose their own custom > views. But then each plugin would have to add its own. Plus they don't get a > particularly good view into the reorder buffer state; they'd have a hard time > maintaining good running stats. > > Some basic monitoring exposed for logical decoding and reorder buffering > would help a lot. Does that sound more palatable? > Can't we think of enhancing existing views or introduce a new view to provide such information? [1] - https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-SLOTS-VIEW -- With Regards, Amit Kapila.