Hello,

I would appreciate if someone explained me how exactly prepared parametrized
statements are planned, i.e. what kind of assumptions planner makes on param
values, selectivity, expected row count etc. that affect in particular
whether indexes will be used or not. For instance consider the following
case:

create table t1 ( id serial, val int);

insert into t1 (val)
select trunc(100000*random())
  from generate_series(1, 1000000);

create index idx_t1 on t1(val);

analyze t1;
------------------------------

1.
prepare stmt (int) as
 select * from t1 where val < $1;

explain execute stmt(100000);

-----------------------
QUERY PLAN
Seq Scan on t1  (cost=0.00..17401.94 rows=333332 width=8)
 Filter: (val < $1)

2.
prepare stmt (int, int) as
 select * from t1 where val > $1 and val < $2;

explain execute stmt(20000, 30000);

-----------------------
QUERY PLAN
Bitmap Heap Scan on t1  (cost= 151.74..5307.59 rows=5000 width=8)
  Recheck Cond: ((val > $1) AND (val < $2))
  ->  Bitmap Index Scan on idx_t1  (cost=0.00..150.49 rows=5000 width=0)
        Index Cond: ((val > $1) AND (val < $2))

Hmm, why does it expect 5000 rows here? What influences this expectation?

3.
prepare stmt (int) as
 select * from t1 where val = $1 or $1 is null;

explain execute stmt(20000);

QUERY PLAN
Seq Scan on t1  (cost=0.00..17401.94 rows=5013 width=8)
  Filter: ((val = $1) OR ($1 IS NULL))

That's the weirdest behavior: where did 5013 rows assumption came from? Why
use seq scan then? I should mention that planner refuses to use anything but
seq scan here even if I explicitly disable it with "set enable_seqscan to
off".

In general, I wonder if one could get somewhat predictable planner behavior
in such cases since we have a lot of code written in plpgsql and the
patterns above are pretty common there.

Thanks,
Viatcheslav

Reply via email to