Re: index prefetching

Tomas Vondra Tue, 22 Jul 2025 18:17:43 -0700

On 7/23/25 02:39, Peter Geoghegan wrote:
> On Tue, Jul 22, 2025 at 8:08 PM Andres Freund <[email protected]> wrote:
>> My response was specific to Tomas' comment that for many queries, which tend
>> to be more complicated than the toys we are using here, there will be CPU
>> costs in the query.
> 
> Got it. That makes sense.
> 
>>                     cheaper query       expensive query
>> simple readahead    8723.209 ms         10615.232 ms
>> complex readahead   5069.438 ms          8018.347 ms
>>
>> Obviously the CPU overhead in this example didn't completely eliminate the IO
>> bottleneck, but sure reduced the difference.
> 
> That's a reasonable distinction, of course.
> 
>> If your assumption is that real queries are more CPU intensive that the toy
>> stuff above, e.g. due to joins etc, you can see why the really attained IO
>> depth is lower.
> 
> Right.
> 
> Perhaps I was just repeating myself. Tomas seemed to be suggesting
> that cases where we'll actually get a decent and completely worthwhile
> improvement with the complex patch would be naturally rare, due in
> part to these effects with CPU overhead. I don't think that that's
> true at all.


It's entirely possible my mental model is too naive, or my intuition
about the queries is wrong ...

My mental model of how this works is that if I know the amount of time
T1 to process a page, and the amount of time T2 to handle an I/O, then I
can estimate when I should have submitted a read for a page. For example
if T1=1ms and T2=10ms, then I know I should submit an I/O ~10 pages
ahead in order to not have to wait. That's the "minimal" queue depth.

Of course, on high latency "cloud storage" the queue depth needs to
grow, because the time T1 to process a page is likely about the same (if
determined by CPU), but the T2 time for I/O is much higher. So we need
to issue the I/O much sooner.

When I mentioned "complex" queries, I meant queries where processing a
page takes much more time. Because it reads the page, and passes it to
other operators in the query plan, some of which may do CPU stuff, some
will trigger some synchronous I/O, etc. Which means T1 grows, and the
"minimal" queue depth decreases.

Which part of this is not quite right?


-- 
Tomas Vondra

Re: index prefetching

Reply via email to