Re: impact join syntax ?? and gist index ??

Marc Millas Sat, 07 Jan 2023 12:58:23 -0800

here they are: (I replace the column and table names) also I post 2 more
remarks, one on left join, and one on the test I did on postgres 15 postgis
3.3...


2023=# explain (analyze, buffers) select A.x, count(B.x) from A left join B
on st_within(B.geom, A.geom) group by A.x;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=212638398.98..212701792.16 rows=20 width=16) (actual
time=86717.857..86757.820 rows=11 loops=1)
   Group Key: A.x
   Buffers: shared hit=4243867
   ->  Sort  (cost=212638398.98..212659529.97 rows=8452398 width=16)
(actual time=86717.851..86727.334 rows=421307 loops=1)
         Sort Key: A.x
         Sort Method: quicksort  Memory: 37963kB
         Buffers: shared hit=4243867
         ->  Nested Loop Left Join  (cost=0.00..211521459.31 rows=8452398
width=16) (actual time=17.473..86642.332 rows=421307 loops=1)
               Join Filter: st_within(B.geom, A.geom)
               Rows Removed by Join Filter: 4229377
               Buffers: shared hit=4243867
               ->  Seq Scan on A  (cost=0.00..9.20 rows=20 width=17752)
(actual time=0.009..0.043 rows=11 loops=1)
                     Buffers: shared hit=9
               ->  Materialize  (cost=0.00..22309.83 rows=422789 width=40)
(actual time=0.001..23.392 rows=422789 loops=11)
                     Buffers: shared hit=15968
                     ->  Seq Scan on B  (cost=0.00..20195.89 rows=422789
width=40) (actual time=0.006..57.651 rows=422789 loops=1)
                           Buffers: shared hit=15968
 Planning Time: 0.693 ms
 Execution Time: 86763.087 ms
(19 lignes)


2023=# explain (analyze, buffers) select A.x, count(B.x) from A, B where
st_within(B.geom, A.geom) group by A.x;

 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=6301606.00..6301608.60 rows=20 width=16)
(actual time=11857.363..11863.212 rows=6 loops=1)
   Group Key: A.x
   Buffers: shared hit=2128836
   ->  Gather Merge  (cost=6301606.00..6301608.30 rows=20 width=16) (actual
time=11857.359..11863.207 rows=12 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         Buffers: shared hit=2128836
         ->  Sort  (cost=6300605.99..6300606.04 rows=20 width=16) (actual
time=11840.355..11840.356 rows=6 loops=2)
               Sort Key: A.x
               Sort Method: quicksort  Memory: 25kB
               Worker 0:  Sort Method: quicksort  Memory: 25kB
               Buffers: shared hit=2128836
               ->  Partial HashAggregate  (cost=6300605.36..6300605.56
rows=20 width=16) (actual time=11840.331..11840.332 rows=6 loops=2)
                     Group Key: A.x
                     Buffers: shared hit=2128825
                     ->  Nested Loop  (cost=0.13..6275745.36 rows=4971999
width=16) (actual time=0.505..11781.817 rows=210651 loops=2)
                           Buffers: shared hit=2128825
                           ->  Parallel Seq Scan on B  (cost=0.00..18454.99
rows=248699 width=40) (actual time=0.005..22.859 rows=211395 loops=2)
                                 Buffers: shared hit=15968
                           ->  Index Scan using A_geom_idx on A
(cost=0.13..25.15 rows=1 width=17752) (actual time=0.054..0.055 rows=1
loops=422789)
                                 Index Cond: (geom ~ B.geom)
                                 Filter: st_within(B.geom, geom)
                                 Rows Removed by Filter: 0
                                 Buffers: shared hit=2112857
 Planning Time: 0.252 ms
 Execution Time: 11863.357 ms
(26 lignes)




Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com



On Sat, Jan 7, 2023 at 9:40 PM Erik Wienhold <[email protected]> wrote:

> > On 07/01/2023 20:46 CET Marc Millas <[email protected]> wrote:
> >
> > Hi,
> >
> > postgres 12, postgis 3.0
> >
> > I have a small table A, 11 rows with a varchar column x and a geometry
> column y.
> > gist index on the geometry column.
> > the geometry do contains multipolygons (regions on a map)
> > I have a second table B , same structure, around 420 000 rows.
> > no index,
> > the geometry do contains points.
> > all geometries are on 4326 srid.
> >
> > If i ask to count points in each multipolygons:
> >
> > select A.x, count(B.x) from A, B where st_within(B.y, A.y) group by A.x;
> > it takes 11 seconds (everything in shared buffers).
> > If I do the very same thing as:
> > select A.x, count(B.x) from A left join B on st_within(B.y, A.y) group
> by A.x;
> > same result, but 85 seconds (every thing in shared buffers, again)
> > if I redo asking with explain analyze, buffers, the plan is very
> different.
> >
> >
> > if I do create a gist index on geometry column of the big table,
> both syntax takes 21 seconds.
> >
> > I get the feeling I am missing something.. (at least 2 things...)
> > can someone shed some light ??
>
> Please provide the executions plans for both queries with and without the
> index on B.y.
>
> --
> Erik
>

Re: impact join syntax ?? and gist index ??

Reply via email to