On 03/30/2017 12:14 AM, Tomas Vondra wrote:

I've only ran the queries on 10GB data set, but that should be enough. The plans are from current master - I'll rerun the script on an older release later today.


So, an plans from an older release (9.4) are attached. What seems to matter is the join between partsupp and part, which is estimated like this on 9.4:

   ->  Sort  (cost=172852.06..173741.94 rows=355951 width=12)
             (actual time=321.998..334.440 rows=86836 loops=1)
         Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
         Sort Method: quicksort  Memory: 7143kB
         ->  Nested Loop  (cost=0.43..140031.03 rows=355951 width=12)
                      (actual time=0.025..303.145 rows=86836 loops=1)

and like this on current master:

  ->  Sort  (cost=146617.86..146819.89 rows=80809 width=12)
            (actual time=562.513..575.599 rows=86836 loops=1)
         Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
         Sort Method: quicksort  Memory: 7143kB
         ->  Nested Loop  (cost=0.43..140031.03 rows=80809 width=12)
                     (actual time=0.054..536.003 rows=86836 loops=1)

which however seems clearly more accurate than 9.4. So perhaps there is some bug in computing the mergejoin estimate, but it's also possible correcting the estimate (lowering it) also has some impact.

But joining to the aggregated subquery also clearly plays some role, because without it the estimates are higher.

Another interesting observation is that only the foreign key between part/partsupp seems to matter - once it gets dropped, the estimates get back close to 9.4 values.

What is however strange is that changing max_parallel_workers_per_gather affects row estimates *above* the Gather node. That seems a bit, um, suspicious, no? See the parallel-estimates.log.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
tpch=# \d lineitem;
                         Table "public.lineitem"
     Column      |         Type          | Collation | Nullable | Default 
-----------------+-----------------------+-----------+----------+---------
 l_orderkey      | integer               |           | not null | 
 l_partkey       | integer               |           | not null | 
 l_suppkey       | integer               |           | not null | 
 l_linenumber    | integer               |           | not null | 
 l_quantity      | numeric               |           |          | 
 l_extendedprice | numeric               |           |          | 
 l_discount      | numeric               |           |          | 
 l_tax           | numeric               |           |          | 
 l_returnflag    | character(1)          |           |          | 
 l_linestatus    | character(1)          |           |          | 
 l_shipdate      | date                  |           |          | 
 l_commitdate    | date                  |           |          | 
 l_receiptdate   | date                  |           |          | 
 l_shipinstruct  | character varying(25) |           |          | 
 l_shipmode      | character varying(10) |           |          | 
 l_comment       | character varying(44) |           |          | 
Indexes:
    "lineitem_pkey" PRIMARY KEY, btree (l_orderkey, l_linenumber)
    "idx_lineitem_orderkey" btree (l_orderkey)
    "idx_lineitem_part_supp" btree (l_partkey, l_suppkey)
    "idx_lineitem_shipdate" btree (l_shipdate, l_discount, l_quantity)
Foreign-key constraints:
    "lineitem_l_orderkey_fkey" FOREIGN KEY (l_orderkey) REFERENCES orders(o_orderkey)
    "lineitem_l_partkey_fkey" FOREIGN KEY (l_partkey, l_suppkey) REFERENCES partsupp(ps_partkey, ps_suppkey)

tpch=# \d part
                                          Table "public.part"
    Column     |         Type          | Collation | Nullable |                 Default                 
---------------+-----------------------+-----------+----------+-----------------------------------------
 p_partkey     | integer               |           | not null | nextval('part_p_partkey_seq'::regclass)
 p_name        | character varying(55) |           |          | 
 p_mfgr        | character(25)         |           |          | 
 p_brand       | character(10)         |           |          | 
 p_type        | character varying(25) |           |          | 
 p_size        | integer               |           |          | 
 p_container   | character(10)         |           |          | 
 p_retailprice | numeric               |           |          | 
 p_comment     | character varying(23) |           |          | 
Indexes:
    "part_pkey" PRIMARY KEY, btree (p_partkey)

tpch=# \d partsupp
                         Table "public.partsupp"
    Column     |          Type          | Collation | Nullable | Default 
---------------+------------------------+-----------+----------+---------
 ps_partkey    | integer                |           | not null | 
 ps_suppkey    | integer                |           | not null | 
 ps_availqty   | integer                |           |          | 
 ps_supplycost | numeric                |           |          | 
 ps_comment    | character varying(199) |           |          | 
Indexes:
    "partsupp_pkey" PRIMARY KEY, btree (ps_partkey, ps_suppkey)
    "idx_partsupp_partkey" btree (ps_partkey)
    "idx_partsupp_suppkey" btree (ps_suppkey)
Foreign-key constraints:
    "partsupp_ps_suppkey_fkey" FOREIGN KEY (ps_suppkey) REFERENCES supplier(s_suppkey)
Referenced by:
    TABLE "lineitem" CONSTRAINT "lineitem_l_partkey_fkey" FOREIGN KEY (l_partkey, l_suppkey) REFERENCES partsupp(ps_partkey, ps_suppkey)

tpch=# set max_parallel_workers_per_gather = 0;
SET
tpch=# explain analyze select
        ps_suppkey
from
        partsupp,
        (
                select
                        l_partkey agg_partkey,
                        l_suppkey agg_suppkey
                from
                        lineitem
                group by
                        l_partkey,
                        l_suppkey
        ) agg_lineitem
where
        agg_partkey = ps_partkey
        and agg_suppkey = ps_suppkey
        and ps_partkey in (
                select
                        p_partkey
                from
                        part
                where
                        p_name like 'hot%'
        );
                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=172852.62..2123144.13 rows=524 width=4) (actual time=400.426..16490.077 rows=86801 loops=1)
   Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
   ->  Group  (cost=0.56..1857642.81 rows=5998611 width=8) (actual time=0.025..15634.679 rows=7995644 loops=1)
         Group Key: lineitem.l_partkey, lineitem.l_suppkey
         ->  Index Only Scan using idx_lineitem_part_supp on lineitem  (cost=0.56..1557712.25 rows=59986112 width=8) (actual time=0.023..11362.271 rows=59983724 loops=1)
               Heap Fetches: 0
   ->  Sort  (cost=172852.06..173741.94 rows=355951 width=12) (actual time=399.428..408.893 rows=86836 loops=1)
         Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
         Sort Method: quicksort  Memory: 7143kB
         ->  Nested Loop  (cost=0.43..140031.03 rows=355951 width=12) (actual time=0.048..374.713 rows=86836 loops=1)
               ->  Seq Scan on part  (cost=0.00..65961.68 rows=20202 width=4) (actual time=0.024..294.117 rows=21709 loops=1)
                     Filter: ((p_name)::text ~~ 'hot%'::text)
                     Rows Removed by Filter: 1978291
               ->  Index Only Scan using partsupp_pkey on partsupp  (cost=0.43..3.49 rows=18 width=8) (actual time=0.003..0.003 rows=4 loops=21709)
                     Index Cond: (ps_partkey = part.p_partkey)
                     Heap Fetches: 0
 Planning time: 0.765 ms
 Execution time: 16493.032 ms
(18 rows)

tpch=# set max_parallel_workers_per_gather = 8;
SET
tpch=# explain analyze select
        ps_suppkey
from
        partsupp,
        (
                select
                        l_partkey agg_partkey,
                        l_suppkey agg_suppkey
                from
                        lineitem
                group by
                        l_partkey,
                        l_suppkey
        ) agg_lineitem
where
        agg_partkey = ps_partkey
        and agg_suppkey = ps_suppkey
        and ps_partkey in (
                select
                        p_partkey
                from
                        part
                where
                        p_name like 'hot%'
        );
                                                                                   QUERY PLAN                                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=131617.44..2094902.90 rows=131 width=4) (actual time=161.644..17339.329 rows=86801 loops=1)
   Merge Cond: ((partsupp.ps_partkey = lineitem.l_partkey) AND (partsupp.ps_suppkey = lineitem.l_suppkey))
   ->  Sort  (cost=131616.88..131839.35 rows=88988 width=12) (actual time=160.581..170.669 rows=86836 loops=1)
         Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
         Sort Method: quicksort  Memory: 7143kB
         ->  Nested Loop  (cost=1000.43..124301.47 rows=88988 width=12) (actual time=0.456..122.994 rows=86836 loops=1)
               ->  Gather  (cost=1000.00..50232.12 rows=20202 width=4) (actual time=0.437..30.468 rows=21709 loops=1)
                     Workers Planned: 4
                     Workers Launched: 4
                     ->  Parallel Seq Scan on part  (cost=0.00..47211.92 rows=5050 width=4) (actual time=0.078..110.786 rows=4342 loops=5)
                           Filter: ((p_name)::text ~~ 'hot%'::text)
                           Rows Removed by Filter: 395658
               ->  Index Only Scan using partsupp_pkey on partsupp  (cost=0.43..3.49 rows=18 width=8) (actual time=0.003..0.004 rows=4 loops=21709)
                     Index Cond: (ps_partkey = part.p_partkey)
                     Heap Fetches: 0
   ->  Materialize  (cost=0.56..1932625.44 rows=5998611 width=8) (actual time=0.029..16521.530 rows=7995644 loops=1)
         ->  Group  (cost=0.56..1857642.81 rows=5998611 width=8) (actual time=0.024..15703.698 rows=7995644 loops=1)
               Group Key: lineitem.l_partkey, lineitem.l_suppkey
               ->  Index Only Scan using idx_lineitem_part_supp on lineitem  (cost=0.56..1557712.25 rows=59986112 width=8) (actual time=0.023..11412.152 rows=59983724 loops=1)
                     Heap Fetches: 0
 Planning time: 0.912 ms
 Execution time: 17342.450 ms
(22 rows)
                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=175035.85..2125423.94 rows=556 width=4) (actual time=588.529..16343.750 rows=86801 loops=1)
   Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
   ->  Group  (cost=0.56..1857643.36 rows=5998614 width=8) (actual time=0.071..15284.268 rows=7995644 loops=1)
         Group Key: lineitem.l_partkey, lineitem.l_suppkey
         ->  Index Only Scan using idx_lineitem_part_supp on lineitem  (cost=0.56..1557712.67 rows=59986140 width=8) (actual time=0.069..11030.543 rows=59983724 loops=1)
               Heap Fetches: 0
   ->  Sort  (cost=175035.28..175957.14 rows=368743 width=12) (actual time=587.760..592.176 rows=86836 loops=1)
         Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
         Sort Method: quicksort  Memory: 7143kB
         ->  Nested Loop  (cost=0.43..140940.83 rows=368743 width=12) (actual time=0.111..567.359 rows=86836 loops=1)
               ->  Seq Scan on part  (cost=0.00..65962.39 rows=20202 width=4) (actual time=0.030..375.667 rows=21709 loops=1)
                     Filter: ((p_name)::text ~~ 'hot%'::text)
                     Rows Removed by Filter: 1978291
               ->  Index Only Scan using partsupp_pkey on partsupp  (cost=0.43..3.53 rows=18 width=8) (actual time=0.007..0.008 rows=4 loops=21709)
                     Index Cond: (ps_partkey = part.p_partkey)
                     Heap Fetches: 0
 Planning time: 1.339 ms
 Execution time: 16346.623 ms
(18 rows)

                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=221656.40..473437.31 rows=19 width=4) (actual time=672.981..2047.363 rows=650135 loops=1)
   Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
   ->  Sort  (cost=221617.04..222975.00 rows=543181 width=12) (actual time=672.919..701.066 rows=650135 loops=1)
         Sort Key: lineitem.l_partkey, lineitem.l_suppkey
         Sort Method: quicksort  Memory: 55052kB
         ->  Nested Loop  (cost=0.56..169876.14 rows=543181 width=12) (actual time=0.026..545.387 rows=650135 loops=1)
               ->  Seq Scan on part  (cost=0.00..65962.39 rows=20202 width=4) (actual time=0.015..297.862 rows=21709 loops=1)
                     Filter: ((p_name)::text ~~ 'hot%'::text)
                     Rows Removed by Filter: 1978291
               ->  Index Only Scan using idx_lineitem_part_supp on lineitem  (cost=0.56..4.87 rows=27 width=8) (actual time=0.003..0.009 rows=30 loops=21709)
                     Index Cond: (l_partkey = part.p_partkey)
                     Heap Fetches: 0
   ->  Index Only Scan using partsupp_pkey on partsupp  (cost=0.43..207746.81 rows=7999892 width=8) (actual time=0.018..781.298 rows=8563023 loops=1)
         Heap Fetches: 0
 Planning time: 0.929 ms
 Execution time: 2065.439 ms
(16 rows)

                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.00..669273.00 rows=22 width=372) (actual time=0.064..4025.504 rows=650135 loops=1)
   ->  Nested Loop  (cost=0.43..218417.03 rows=80806 width=274) (actual time=0.032..513.791 rows=86836 loops=1)
         ->  Seq Scan on part  (cost=0.00..65962.39 rows=20202 width=130) (actual time=0.020..297.632 rows=21709 loops=1)
               Filter: ((p_name)::text ~~ 'hot%'::text)
               Rows Removed by Filter: 1978291
         ->  Index Scan using idx_partsupp_partkey on partsupp  (cost=0.43..7.37 rows=18 width=144) (actual time=0.008..0.009 rows=4 loops=21709)
               Index Cond: (ps_partkey = part.p_partkey)
   ->  Index Scan using idx_lineitem_part_supp on lineitem  (cost=0.56..5.57 rows=1 width=98) (actual time=0.007..0.039 rows=7 loops=86836)
         Index Cond: ((l_partkey = partsupp.ps_partkey) AND (l_suppkey = partsupp.ps_suppkey))
 Planning time: 0.923 ms
 Execution time: 4043.329 ms
(11 rows)
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to