Re: EXPLAIN: showing ReadStream / prefetch stats

Tomas Vondra Mon, 16 Mar 2026 11:26:34 -0700

On 3/16/26 18:29, Melanie Plageman wrote:
> On Mon, Mar 16, 2026 at 3:08 AM Lukas Fittl <[email protected]> wrote:
>>
>> I'm 50/50 if hiding this behind a new option really makes sense - if
>> its cheap enough to always capture, why not always show it?
>>
>> e.g. we could consider doing with this what we did with BUFFERS
>> recently, which is to enable it by default. If someone finds that too
>> visually busy, they could still do IO OFF.
> 
> I like this idea.
>


Maybe. I think it's important not to overwhelm the users with too much
information, and so my intuition was to keep it OFF by default. But if
the agreement is that it should be opt-out, I can live with that.

>> This also does make me wonder a bit what we should do about I/O
>> timings. Conceptually they'd belong closer to IO now than BUFFERS..
> 
> I could see moving the timings to the I/O line.
> 

Moving the I/O timings to the "IO" line seems reasonable to me.

>>> The second line "I/O" is about the I/O requests actually issued - how
>>> many times we had to wait for the block (when we get to process it),
>>> average size of a request (in BLCKSZ blocks), and average number of
>>> in-progress requests.
>>
>> I wonder if we could somehow consolidate this into one line for the
>> text format? (specifically, moving prefetch into "I/O" at the end?)
> 
> Next release, I'm hopeful we'll get write combining in and then want
> to include write IO details here. Not a reason to avoid making it one
> line now, though. However, I think it will be a very long line...
> 

Let's not make the lines too long. The distance information is not
really about I/O, it's about an internal queue / heuristics.

FWIW I assume we'd want to track some of the metrics (average IO size,
number of IOs, ...) separate for reads and writes, so I'm not sure those
will go to the same line too.

>> I'm also not sure if "max" is really that useful, vs capacity?
> 
> I find it very helpful. If you keep increasing
> effective_io_concurrency and io_combine_limit and don't see the max
> increasing, I think it is helpful to know what the configured possible
> limit would be vs what the read stream actually got you up to. And
> because you can set effective_io_concurrency and io_combine_limit per
> query, it is nice to know what this value was at the time the query
> was run.
> 

I'm not sure. Shouldn't the average distance give us about the same
thing? Sure, it'll never be exactly the maximum, but if the prefetch
behavior is reasonably stable, it should not be too far.

It'll be different if the prefetch distance varies a lot (e.g. because
parts of the table have vastly different cache hit ratios). But then
it's hard to interpret the "max" too, I think.

So maybe just average + capacity might suffice?

>> I feel like something is off about the complexity of having each node
>> type ferry back the information. e.g. when you're implementing the
>> support for index prefetching, it'll require a bunch more changes. In
>> my mind, there is a reason we have a related problem that we solved
>> with the current pgBufferUsage, instead of dealing with that on a
>> per-node basis. I really feel we should have a more generic way of
>> dealing with this.
> <--snip-->
>> I've attached a prototype of how that could look like (apply the other
>> patch set first, v8, see commit fest entry [1] - also attached a
>> preparatory refactoring of using "Instrumentation" for parallel query
>> reporting, which avoids having individual structs there).
> 
> The patch footprint is _much_ nicer with your stack-based
> instrumentation. Very cool. I'll leave it to Tomas whether he wants to
> create a dependency on your big project a few weeks before feature
> freeze, though.
> 

I don't know. I have not looked at the stack-based instrumentation yet,
so I have no idea how complex or how close to committable it is.

>> Its also worth noting that this would make it trivial to output this
>> information for utility commands that have read stream support, or
>> show aggregate statistics in pg_stat_statements/etc.
> 
> Yes this would be a major bonus. Even for currently possible users,
> this hasn't been extended to TID Range Scan -- which involves more LOC
> and boilerplate.
> 

Oh, I forgot about the TID scan ...

I agree doing this in a way that does not require changes to TAM
interface would be nice.


regards

-- 
Tomas Vondra

Re: EXPLAIN: showing ReadStream / prefetch stats

Reply via email to