On Sat, 4 May 2024 at 06:55, Cary Huang <cary.hu...@highgo.ca> wrote:
> With syncscan enabled, the "table_block_parallelscan_nextpage()" would
> return the next block since the end of the first tid rangescan instead of the
> correct start block that should be scanned. I see that single tid rangescan
> does not have SO_ALLOW_SYNC set, so I figure syncscan should also be
> disabled in parallel case. With this change, then it would be okay to call
> heap_setscanlimits() in parallel case, so I added this call back to
> heap_set_tidrange() in both serial and parallel cases.

This now calls heap_setscanlimits() for the parallel version, it's
just that table_block_parallelscan_nextpage() does nothing to obey
those limits.

The only reason the code isn't reading the entire table is due to the
optimisation in heap_getnextslot_tidrange() which returns false when
the ctid goes out of range. i.e, this code:

/*
* When scanning forward, the TIDs will be in ascending order.
* Future tuples in this direction will be higher still, so we can
* just return false to indicate there will be no more tuples.
*/
if (ScanDirectionIsForward(direction))
    return false;

If I comment out that line, I see all pages are accessed:

postgres=# explain (analyze, buffers) select count(*) from a where
ctid >= '(0,1)' and ctid < '(11,0)';
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=18.80..18.81 rows=1 width=8) (actual
time=33.530..36.118 rows=1 loops=1)
   Buffers: shared read=4425
   ->  Gather  (cost=18.78..18.79 rows=2 width=8) (actual
time=33.456..36.102 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared read=4425
         ->  Partial Aggregate  (cost=18.78..18.79 rows=1 width=8)
(actual time=20.389..20.390 rows=1 loops=3)
               Buffers: shared read=4425
               ->  Parallel Tid Range Scan on a  (cost=0.01..16.19
rows=1035 width=0) (actual time=9.375..20.349 rows=829 loops=3)
                     TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
                     Buffers: shared read=4425 <----  this is all
pages in the table instead of 11 pages.

With that code still commented out, the non-parallel version still
won't read all pages due to the setscanlimits being obeyed.

postgres=# set max_parallel_workers_per_gather=0;
SET
postgres=# explain (analyze, buffers) select count(*) from a where
ctid >= '(0,1)' and ctid < '(11,0)';
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=45.07..45.08 rows=1 width=8) (actual
time=0.302..0.302 rows=1 loops=1)
   Buffers: shared hit=11
   ->  Tid Range Scan on a  (cost=0.01..38.86 rows=2485 width=0)
(actual time=0.019..0.188 rows=2486 loops=1)
         TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid < '(11,0)'::tid))
         Buffers: shared hit=11


If I put that code back in, how many pages are read depends on the
number of parallel workers as workers will keep running with higher
page numbers and heap_getnextslot_tidrange() will just (inefficiently)
filter those out.

max_parallel_workers_per_gather=2;
               ->  Parallel Tid Range Scan on a  (cost=0.01..16.19
rows=1035 width=0) (actual time=0.191..0.310 rows=829 loops=3)
                     TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
                     Buffers: shared read=17

max_parallel_workers_per_gather=3;
               ->  Parallel Tid Range Scan on a  (cost=0.01..12.54
rows=802 width=0) (actual time=0.012..0.114 rows=622 loops=4)
                     TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
                     Buffers: shared hit=19

max_parallel_workers_per_gather=4;
               ->  Parallel Tid Range Scan on a  (cost=0.01..9.72
rows=621 width=0) (actual time=0.014..0.135 rows=497 loops=5)
                     TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
                     Buffers: shared hit=21

To fix this you need to make table_block_parallelscan_nextpage obey
the limits imposed by heap_setscanlimits().

The equivalent code in the non-parallel version is in
heapgettup_advance_block().

/* check if the limit imposed by heap_setscanlimits() is met */
if (scan->rs_numblocks != InvalidBlockNumber)
{
    if (--scan->rs_numblocks == 0)
        return InvalidBlockNumber;
}

I've not studied exactly how you'd get the rs_numblock information
down to the parallel scan descriptor.  But when you figure that out,
just remember that you can't do the --scan->rs_numblocks from
table_block_parallelscan_nextpage() as that's not parallel safe. You
might be able to add an or condition to: "if (nallocated >=
pbscan->phs_nblocks)" to make it "if (nallocated >=
pbscan->phs_nblocks || nallocated >= pbscan->phs_numblocks)",
although the field names don't seem very intuitive there. It would be
nicer if the HeapScanDesc field was called rs_blocklimit rather than
rs_numblocks.  It's not for this patch to go messing with that,
however.

David


Reply via email to