After setting seq_page_cost to 3 the execution plan became good, without SeqScan, but it seems strange to set seq_page_cost almost equal to random_page_cost, therefore i've set seq_page_cost back to defaults, increased the statistics for "sub_id" in "mba_test.subscr_param" to 1000. That gave me the value of estimated number of rows for subscr_param_pkey closer to the real value (est.64, real 30) which affected the execution plan in the right way.
>Thursday, June 29, 2017 7:21 PM +03:00 from Mikhail <bem...@mail.ru>: > > >Hi guys, >I'm loss. I'm running: >=# select version(); > version >---------------------------------------------------------------------------------------------------------- >PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 >(Red Hat 4.8.5-11), 64-bit > >=# show work_mem; >work_mem >---------- >27MB > >=# show shared_buffers; >shared_buffers >---------------- >3520MB > > >Having the tables: >CREATE TABLE mba_test.subscr_param( > sub_id integer NOT NULL, > res_id integer NOT NULL, > rel_status integer, > status integer, > and so on.. total 35 columns > >CREATE TABLE mba_test.subscr ( > sub_id integer NOT NULL, > sub_name character varying(80), > status integer NOT NULL, > acc_id integer, > and so on.. total 59 columns > >alter table mba_test.subscr_param add primary key (sub_id, res_id); >alter table mba_test.subscr add primary key (sub_id); >create index idx_subscr_acc_id on mba_test.subscr(acc_id); > >Tables and indexes has the following sizes / statistics: >=# select relname, relpages, reltuples, pg_size_pretty(pg_relation_size(oid)) > from pg_class > where relname in ('subscr_param', 'subscr', 'idx_subscr_acc_id', >'subscr_pkey', 'subscr_param_pkey'); > relname | relpages | reltuples | pg_size_pretty >-----------------------+----------+-------------+---------------- >subscr | 24086 | 825305 | 188 MB >subscr_param_pkey | 115968 | 4.22936e+07 | 906 MB >subscr_param | 1446158 | 4.22936e+07 | 11 GB >subscr_pkey | 2265 | 825305 | 18 MB >idx_subscr_acc_id | 2265 | 825305 | 18 MB > >And "subscr" data distribution is: >=# select acc_id, count(*) from mba_test.subscr group by acc_id order by >count(*) desc limit 5; > acc_id | count >---------+------- >1089212 | 5232 >1000154 | 2884 >1587496 | 1896 >1409682 | 1746 >1249568 | 1149 > >=# select count(*) from mba_test.subscr; >count >-------- >825305 > >=# select count(*) from mba_test.subscr_param; > count >---------- >42293572 > >Now, i take the second acc_id (1000154) and run the query below twice (to have >cached everything i can). The second execution gives the following: > >=# explain (analyze, buffers) SELECT "SP".res_id, "SP".sub_id > FROM mba_test.subscr_param "SP" > JOIN mba_test.subscr "S" ON "SP".sub_id = "S".sub_id >WHERE "S".acc_id = 1000154; > QUERY PLAN >----------------------------------------------------------------------------------------------------------------------------------------------------- >Hash Join (cost=7841.72..2036917.93 rows=138159 width=8) (actual >time=39.501..10086.843 rows=86933 loops=1) > Hash Cond: ("SP".sub_id = "S".sub_id) > Buffers: shared hit=178674 read=1269448 > -> Seq Scan on subscr_param "SP" (cost=0.00..1869093.72 rows=42293572 >width=8) (actual time=0.024..6294.100 rows=42293572 loops=1) > Buffers: shared hit=176710 read=1269448 > -> Hash (cost=7808.02..7808.02 rows=2696 width=4) (actual >time=3.161..3.161 rows=2884 loops=1) > Buckets: 4096 Batches: 1 Memory Usage: 134kB > Buffers: shared hit=1964 > -> Bitmap Heap Scan on subscr "S" (cost=53.32..7808.02 rows=2696 >width=4) (actual time=0.471..2.802 rows=2884 loops=1) > Recheck Cond: (acc_id = 1000154) > Heap Blocks: exact=1953 > Buffers: shared hit=1964 > -> Bitmap Index Scan on idx_subscr_acc_id (cost=0.00..52.64 >rows=2696 width=0) (actual time=0.273..0.273 rows=2884 loops=1) > Index Cond: (acc_id = 1000154) > Buffers: shared hit=11 >Planning time: 0.155 ms >Execution time: 10091.265 ms > >Seems strange to decide to seq scan the table with 42 mln rows and size 11GB >when having the index (actually, primary key) containing both columns (sub_id >and res_id) which is less than 1GB. > >Now, i've explicitly turned the sec scans off and got perfect execution plan: > >=# set enable_seqscan = off; >=# explain (analyze, buffers) SELECT "SP".res_id, "SP".sub_id > FROM mba_test.subscr_param "SP" > JOIN mba_test.subscr "S" ON "SP".sub_id = "S".sub_id >WHERE "S".acc_id = 1000154; > >QUERY PLAN >----------------------------------------------------------------------------------------------------------------------------------------------------------------- >Nested Loop (cost=53.88..4954693.91 rows=138159 width=8) (actual >time=0.471..62.315 rows=86933 loops=1) > Buffers: shared hit=50837 > -> Bitmap Heap Scan on subscr "S" (cost=53.32..7808.02 rows=2696 width=4) >(actual time=0.459..3.250 rows=2884 loops=1) > Recheck Cond: (acc_id = 1000154) > Heap Blocks: exact=1953 > Buffers: shared hit=1964 > -> Bitmap Index Scan on idx_subscr_acc_id (cost=0.00..52.64 >rows=2696 width=0) (actual time=0.258..0.258 rows=2884 loops=1) > Index Cond: (acc_id = 1000154) > Buffers: shared hit=11 > -> Index Only Scan using subscr_param_pkey on subscr_param "SP" >(cost=0.56..1825.67 rows=923 width=8) (actual time=0.004..0.017 rows=30 >loops=2884) > Index Cond: (sub_id = "S".sub_id) > Heap Fetches: 86933 > Buffers: shared hit=48873 >Planning time: 0.169 ms >Execution time: 66.644 ms > >67 milliseconds vs 10 seconds! While the cost is two times bigger (4954693 vs >2036917). >My thoughts are: taking into account that the estimated number of rows to >fetch on bitmap heap scan is approx. right (est. 2696 - real 2884), the >problem is with index scan cost estimation, which results to 4920200 >(2696*1825). And the miss in cost estimation is because of the estimation of the number of rows to return (est. 923 - real 30). > >And my question: > 1. am i right in my hypothesis? > 2. is there a way to affect the cost evaluation in my case to help >postgresql choose the right execution plan? > 3. is there a way to fix this query and not to break the execution of other >queries? > >ps: running "analyze" on both tables doesn't affect the result >pps: all "cost" parameters are in their default values: cpu_index_tuple_cost, >cpu_operator_cost, cpu_tuple_cost, random_page_cost, seq_page_cost. > >Thanks, Mikhail >----------------------------------------------------------------------