On Thu, May 6, 2021 at 9:54 AM Craig Ringer <cr...@2ndquadrant.com> wrote:
>
> On Thu, 6 May 2021 at 02:28, Andres Freund <and...@anarazel.de> wrote:
>>
>> Hi,
>>
>> On 2021-05-05 18:33:27 +0800, Craig Ringer wrote:
>> > I'm thinking of piggy-backing on the approach used in the "Get memory
>> > contexts of an arbitrary backend process" patch in order to provide access
>> > to detailed reorder buffer content statistics from walsenders on request.
>> >
>> > Right now the reorder buffer is mostly a black-box. I mostly rely on gdb or
>> > on dynamic probes (perf, systemtap) to see what it's doing. I intend a
>> > patch soon to add a couple of fields to struct WalSnd to report some very
>> > coarse reorder buffer stats - at least oldest buffered xid, number of
>> > buffered txns, total bytes of buffered txns in memory, total bytes of
>> > buffered txns spilled to disk.
>> >
>> > But sometimes what I really want is details on the txns that're in the
>> > reorder buffer, and that's not feasible to export via always-enabled
>> > reporting like struct WalSnd. So I'm thinking that the same approach used
>> > for the memory context stats patch might work well for asking the walsender
>> > for a detailed dump of reorder buffer contents. Something like a
>> > per-buffered-txn table of txn topxid, start-lsn, most recent change lsn,
>> > number of changes, number of subxids, number of invalidations, number of
>> > catalog changes, buffer size in memory, buffer size spilled to disk.
>> >
>> > Anyone drastically opposed to the idea?
>>
>> I am doubtful. The likelihood of ending with effectively unused code
>> seems very substantial here.
>
>
> I can't rule that out, but the motivation for this proposal isn't development 
> convenience. It's production support and operations. The reorder buffer is a 
> black box right now, and when you're trying to answer the questions "what is 
> the walsender doing," "is meaningful progress being made," and "what is 
> slowing down replication" it's ... not easy.
>
> I currently rely on some fairly hairy gdb scripts, which I'm not keen on 
> running on production systems if I can avoid it.
>
> I'm far from set on the approach of asking the walsender to dump a reorder 
> buffer state summary to a file. But I don't think the current state of 
> affairs is much fun for production use. Especially since we prevent the 
> pg_stat_replication sent_lsn from going backwards, so reorder buffering can 
> cause replication to appear to completely cease to progress for long periods 
> unless you identify the socket and monitor traffic on it, or you intrude on 
> the process with gdb.
>
> At the least it'd be helpful to have pg_stat_replication (or a new related 
> auxiliary view like pg_stat_logical_decoding) show
>
> - walsender total bytes sent this session
> - number of txns processed this txn
>

You might be able to derive some of the above sorts of stats from the
newly added pg_stat_replication_slots [1].

> - number txns filtered out by output plugin this session
> - oldest xid in reorder buffer
> - reorder buffer number of txns
> - reorder buffer total size (in-memory and total inc spilled)
> - reorderbuffercommit current xid, last change lsn, total buffered size of 
> current tx, total bytes of buffer processed so far within the current txn, 
> and commit lsn if known, only when currently streaming a txn from 
> reorderbuffercommit
>
> That way it'd be possible to tell if a logical walsender is currently 
> processing a commit and get a much better sense of its progress within the 
> commit.
>
> Perhaps output plugins could do some of this and expose their own custom 
> views. But then each plugin would have to add its own. Plus they don't get a 
> particularly good view into the reorder buffer state; they'd have a hard time 
> maintaining good running stats.
>
> Some basic monitoring exposed for logical decoding and reorder buffering 
> would help a lot. Does that sound more palatable?
>

Can't we think of enhancing existing views or introduce a new view to
provide such information?

[1] - 
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-SLOTS-VIEW

-- 
With Regards,
Amit Kapila.


Reply via email to