On 03/30/2017 12:14 AM, Tomas Vondra wrote:
I've only ran the queries on 10GB data set, but that should be enough.
The plans are from current master - I'll rerun the script on an older
release later today.
So, an plans from an older release (9.4) are attached. What seems to
matter is the join between partsupp and part, which is estimated like
this on 9.4:
-> Sort (cost=172852.06..173741.94 rows=355951 width=12)
(actual time=321.998..334.440 rows=86836 loops=1)
Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
Sort Method: quicksort Memory: 7143kB
-> Nested Loop (cost=0.43..140031.03 rows=355951 width=12)
(actual time=0.025..303.145 rows=86836 loops=1)
and like this on current master:
-> Sort (cost=146617.86..146819.89 rows=80809 width=12)
(actual time=562.513..575.599 rows=86836 loops=1)
Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
Sort Method: quicksort Memory: 7143kB
-> Nested Loop (cost=0.43..140031.03 rows=80809 width=12)
(actual time=0.054..536.003 rows=86836 loops=1)
which however seems clearly more accurate than 9.4. So perhaps there is
some bug in computing the mergejoin estimate, but it's also possible
correcting the estimate (lowering it) also has some impact.
But joining to the aggregated subquery also clearly plays some role,
because without it the estimates are higher.
Another interesting observation is that only the foreign key between
part/partsupp seems to matter - once it gets dropped, the estimates get
back close to 9.4 values.
What is however strange is that changing max_parallel_workers_per_gather
affects row estimates *above* the Gather node. That seems a bit, um,
suspicious, no? See the parallel-estimates.log.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
tpch=# \d lineitem;
Table "public.lineitem"
Column | Type | Collation | Nullable | Default
-----------------+-----------------------+-----------+----------+---------
l_orderkey | integer | | not null |
l_partkey | integer | | not null |
l_suppkey | integer | | not null |
l_linenumber | integer | | not null |
l_quantity | numeric | | |
l_extendedprice | numeric | | |
l_discount | numeric | | |
l_tax | numeric | | |
l_returnflag | character(1) | | |
l_linestatus | character(1) | | |
l_shipdate | date | | |
l_commitdate | date | | |
l_receiptdate | date | | |
l_shipinstruct | character varying(25) | | |
l_shipmode | character varying(10) | | |
l_comment | character varying(44) | | |
Indexes:
"lineitem_pkey" PRIMARY KEY, btree (l_orderkey, l_linenumber)
"idx_lineitem_orderkey" btree (l_orderkey)
"idx_lineitem_part_supp" btree (l_partkey, l_suppkey)
"idx_lineitem_shipdate" btree (l_shipdate, l_discount, l_quantity)
Foreign-key constraints:
"lineitem_l_orderkey_fkey" FOREIGN KEY (l_orderkey) REFERENCES orders(o_orderkey)
"lineitem_l_partkey_fkey" FOREIGN KEY (l_partkey, l_suppkey) REFERENCES partsupp(ps_partkey, ps_suppkey)
tpch=# \d part
Table "public.part"
Column | Type | Collation | Nullable | Default
---------------+-----------------------+-----------+----------+-----------------------------------------
p_partkey | integer | | not null | nextval('part_p_partkey_seq'::regclass)
p_name | character varying(55) | | |
p_mfgr | character(25) | | |
p_brand | character(10) | | |
p_type | character varying(25) | | |
p_size | integer | | |
p_container | character(10) | | |
p_retailprice | numeric | | |
p_comment | character varying(23) | | |
Indexes:
"part_pkey" PRIMARY KEY, btree (p_partkey)
tpch=# \d partsupp
Table "public.partsupp"
Column | Type | Collation | Nullable | Default
---------------+------------------------+-----------+----------+---------
ps_partkey | integer | | not null |
ps_suppkey | integer | | not null |
ps_availqty | integer | | |
ps_supplycost | numeric | | |
ps_comment | character varying(199) | | |
Indexes:
"partsupp_pkey" PRIMARY KEY, btree (ps_partkey, ps_suppkey)
"idx_partsupp_partkey" btree (ps_partkey)
"idx_partsupp_suppkey" btree (ps_suppkey)
Foreign-key constraints:
"partsupp_ps_suppkey_fkey" FOREIGN KEY (ps_suppkey) REFERENCES supplier(s_suppkey)
Referenced by:
TABLE "lineitem" CONSTRAINT "lineitem_l_partkey_fkey" FOREIGN KEY (l_partkey, l_suppkey) REFERENCES partsupp(ps_partkey, ps_suppkey)
tpch=# set max_parallel_workers_per_gather = 0;
SET
tpch=# explain analyze select
ps_suppkey
from
partsupp,
(
select
l_partkey agg_partkey,
l_suppkey agg_suppkey
from
lineitem
group by
l_partkey,
l_suppkey
) agg_lineitem
where
agg_partkey = ps_partkey
and agg_suppkey = ps_suppkey
and ps_partkey in (
select
p_partkey
from
part
where
p_name like 'hot%'
);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=172852.62..2123144.13 rows=524 width=4) (actual time=400.426..16490.077 rows=86801 loops=1)
Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
-> Group (cost=0.56..1857642.81 rows=5998611 width=8) (actual time=0.025..15634.679 rows=7995644 loops=1)
Group Key: lineitem.l_partkey, lineitem.l_suppkey
-> Index Only Scan using idx_lineitem_part_supp on lineitem (cost=0.56..1557712.25 rows=59986112 width=8) (actual time=0.023..11362.271 rows=59983724 loops=1)
Heap Fetches: 0
-> Sort (cost=172852.06..173741.94 rows=355951 width=12) (actual time=399.428..408.893 rows=86836 loops=1)
Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
Sort Method: quicksort Memory: 7143kB
-> Nested Loop (cost=0.43..140031.03 rows=355951 width=12) (actual time=0.048..374.713 rows=86836 loops=1)
-> Seq Scan on part (cost=0.00..65961.68 rows=20202 width=4) (actual time=0.024..294.117 rows=21709 loops=1)
Filter: ((p_name)::text ~~ 'hot%'::text)
Rows Removed by Filter: 1978291
-> Index Only Scan using partsupp_pkey on partsupp (cost=0.43..3.49 rows=18 width=8) (actual time=0.003..0.003 rows=4 loops=21709)
Index Cond: (ps_partkey = part.p_partkey)
Heap Fetches: 0
Planning time: 0.765 ms
Execution time: 16493.032 ms
(18 rows)
tpch=# set max_parallel_workers_per_gather = 8;
SET
tpch=# explain analyze select
ps_suppkey
from
partsupp,
(
select
l_partkey agg_partkey,
l_suppkey agg_suppkey
from
lineitem
group by
l_partkey,
l_suppkey
) agg_lineitem
where
agg_partkey = ps_partkey
and agg_suppkey = ps_suppkey
and ps_partkey in (
select
p_partkey
from
part
where
p_name like 'hot%'
);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=131617.44..2094902.90 rows=131 width=4) (actual time=161.644..17339.329 rows=86801 loops=1)
Merge Cond: ((partsupp.ps_partkey = lineitem.l_partkey) AND (partsupp.ps_suppkey = lineitem.l_suppkey))
-> Sort (cost=131616.88..131839.35 rows=88988 width=12) (actual time=160.581..170.669 rows=86836 loops=1)
Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
Sort Method: quicksort Memory: 7143kB
-> Nested Loop (cost=1000.43..124301.47 rows=88988 width=12) (actual time=0.456..122.994 rows=86836 loops=1)
-> Gather (cost=1000.00..50232.12 rows=20202 width=4) (actual time=0.437..30.468 rows=21709 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on part (cost=0.00..47211.92 rows=5050 width=4) (actual time=0.078..110.786 rows=4342 loops=5)
Filter: ((p_name)::text ~~ 'hot%'::text)
Rows Removed by Filter: 395658
-> Index Only Scan using partsupp_pkey on partsupp (cost=0.43..3.49 rows=18 width=8) (actual time=0.003..0.004 rows=4 loops=21709)
Index Cond: (ps_partkey = part.p_partkey)
Heap Fetches: 0
-> Materialize (cost=0.56..1932625.44 rows=5998611 width=8) (actual time=0.029..16521.530 rows=7995644 loops=1)
-> Group (cost=0.56..1857642.81 rows=5998611 width=8) (actual time=0.024..15703.698 rows=7995644 loops=1)
Group Key: lineitem.l_partkey, lineitem.l_suppkey
-> Index Only Scan using idx_lineitem_part_supp on lineitem (cost=0.56..1557712.25 rows=59986112 width=8) (actual time=0.023..11412.152 rows=59983724 loops=1)
Heap Fetches: 0
Planning time: 0.912 ms
Execution time: 17342.450 ms
(22 rows)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=175035.85..2125423.94 rows=556 width=4) (actual time=588.529..16343.750 rows=86801 loops=1)
Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
-> Group (cost=0.56..1857643.36 rows=5998614 width=8) (actual time=0.071..15284.268 rows=7995644 loops=1)
Group Key: lineitem.l_partkey, lineitem.l_suppkey
-> Index Only Scan using idx_lineitem_part_supp on lineitem (cost=0.56..1557712.67 rows=59986140 width=8) (actual time=0.069..11030.543 rows=59983724 loops=1)
Heap Fetches: 0
-> Sort (cost=175035.28..175957.14 rows=368743 width=12) (actual time=587.760..592.176 rows=86836 loops=1)
Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
Sort Method: quicksort Memory: 7143kB
-> Nested Loop (cost=0.43..140940.83 rows=368743 width=12) (actual time=0.111..567.359 rows=86836 loops=1)
-> Seq Scan on part (cost=0.00..65962.39 rows=20202 width=4) (actual time=0.030..375.667 rows=21709 loops=1)
Filter: ((p_name)::text ~~ 'hot%'::text)
Rows Removed by Filter: 1978291
-> Index Only Scan using partsupp_pkey on partsupp (cost=0.43..3.53 rows=18 width=8) (actual time=0.007..0.008 rows=4 loops=21709)
Index Cond: (ps_partkey = part.p_partkey)
Heap Fetches: 0
Planning time: 1.339 ms
Execution time: 16346.623 ms
(18 rows)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=221656.40..473437.31 rows=19 width=4) (actual time=672.981..2047.363 rows=650135 loops=1)
Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
-> Sort (cost=221617.04..222975.00 rows=543181 width=12) (actual time=672.919..701.066 rows=650135 loops=1)
Sort Key: lineitem.l_partkey, lineitem.l_suppkey
Sort Method: quicksort Memory: 55052kB
-> Nested Loop (cost=0.56..169876.14 rows=543181 width=12) (actual time=0.026..545.387 rows=650135 loops=1)
-> Seq Scan on part (cost=0.00..65962.39 rows=20202 width=4) (actual time=0.015..297.862 rows=21709 loops=1)
Filter: ((p_name)::text ~~ 'hot%'::text)
Rows Removed by Filter: 1978291
-> Index Only Scan using idx_lineitem_part_supp on lineitem (cost=0.56..4.87 rows=27 width=8) (actual time=0.003..0.009 rows=30 loops=21709)
Index Cond: (l_partkey = part.p_partkey)
Heap Fetches: 0
-> Index Only Scan using partsupp_pkey on partsupp (cost=0.43..207746.81 rows=7999892 width=8) (actual time=0.018..781.298 rows=8563023 loops=1)
Heap Fetches: 0
Planning time: 0.929 ms
Execution time: 2065.439 ms
(16 rows)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.00..669273.00 rows=22 width=372) (actual time=0.064..4025.504 rows=650135 loops=1)
-> Nested Loop (cost=0.43..218417.03 rows=80806 width=274) (actual time=0.032..513.791 rows=86836 loops=1)
-> Seq Scan on part (cost=0.00..65962.39 rows=20202 width=130) (actual time=0.020..297.632 rows=21709 loops=1)
Filter: ((p_name)::text ~~ 'hot%'::text)
Rows Removed by Filter: 1978291
-> Index Scan using idx_partsupp_partkey on partsupp (cost=0.43..7.37 rows=18 width=144) (actual time=0.008..0.009 rows=4 loops=21709)
Index Cond: (ps_partkey = part.p_partkey)
-> Index Scan using idx_lineitem_part_supp on lineitem (cost=0.56..5.57 rows=1 width=98) (actual time=0.007..0.039 rows=7 loops=86836)
Index Cond: ((l_partkey = partsupp.ps_partkey) AND (l_suppkey = partsupp.ps_suppkey))
Planning time: 0.923 ms
Execution time: 4043.329 ms
(11 rows)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers