Re: index prefetching

Andres Freund Tue, 12 Aug 2025 12:49:24 -0700

Hi,

On 2025-08-12 18:53:13 +0200, Tomas Vondra wrote:
> I'm running some tests looking for these weird changes, not just with
> the patches, but on master too. And I don't think b4212231 changed the
> situation very much.
> 
> FWIW this issue is not caused by the index prefetching patches, I can
> reproduce it with master (on b227b0bb4e032e19b3679bedac820eba3ac0d1cf
> from yesterday). So maybe we should split this into a separate thread.
> 
> Consider for example the dataset built by create.sql - it's randomly
> generated, but the idea is that it's correlated, but not perfectly. The
> table is ~3.7GB, and it's a cold run - caches dropped + restart).
> 
> Anyway, a simple range query look like this:
> 
> EXPLAIN (ANALYZE, COSTS OFF)
> SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a ASC;
> 
>                                 QUERY PLAN
> ------------------------------------------------------------------------
>  Index Scan using idx on t
>    (actual time=0.584..433.208 rows=1048576.00 loops=1)
>    Index Cond: ((a >= 16336) AND (a <= 49103))
>    Index Searches: 1
>    Buffers: shared hit=7435 read=50872
>    I/O Timings: shared read=332.270
>  Planning:
>    Buffers: shared hit=78 read=23
>    I/O Timings: shared read=2.254
>  Planning Time: 3.364 ms
>  Execution Time: 463.516 ms
> (10 rows)
> 
> EXPLAIN (ANALYZE, COSTS OFF)
> SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a DESC;
> 
>                                 QUERY PLAN
> ------------------------------------------------------------------------
>  Index Scan Backward using idx on t
>    (actual time=0.566..22002.780 rows=1048576.00 loops=1)
>    Index Cond: ((a >= 16336) AND (a <= 49103))
>    Index Searches: 1
>    Buffers: shared hit=36131 read=50872
>    I/O Timings: shared read=21217.995
>  Planning:
>    Buffers: shared hit=82 read=23
>    I/O Timings: shared read=2.375
>  Planning Time: 3.478 ms
>  Execution Time: 22231.755 ms
> (10 rows)
> 
> That's a pretty massive difference ... this is on my laptop, and the
> timing changes quite a bit, but it's always a multiple of the first
> query with forward scan.


I suspect what you're mainly seeing here is that the OS can do readahead for
us for forward scans, but not for backward scans.  Indeed, if I look at
iostat, the forward scan shows:

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme6n1       3352.00    400.89     0.00   0.00    0.18   122.47    0.00      
0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00 
    0.00    0.00    0.00    0.62  47.90

whereas the backward scan shows:

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme6n1       10958.00     85.57     0.00   0.00    0.06     8.00    0.00      
0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00 
    0.00    0.00    0.00    0.69  63.80

Note the different read sizes...



> I did look into pg_aios, but there's only 8kB requests in both cases. I
> didn't have time to look closer yet.

That's what we'd expect, right? There's nothing on master that'd perform read
combining for index scans...

Greetings,

Andres Freund

Re: index prefetching

Reply via email to