2016-03-07 15:01 GMT+02:00 Geoff Winkless <pgsqlad...@geoff.dj>:

> Unfortunately the cost of changing all the code that uses MIN() in
> this way would be higher than just adding an extra index :(
>
> I suppose the thought is that for selecting just the MIN() value, by
> traipsing through the index you immediately find the lowest match - so
> for a dataset where scdate cardinality is higher, this would make
> sense; indeed if I give this query a value with scdate in the low
> range of the table it returns quickly (although still slower than when
> it uses the scdate index).
>
> It seems to me that the weighting the planner applied to this MIN()
> rule is too high, or perhaps it needs to pay more attention to the
> statistics of the indexes for the WHERE clauses?
>
> Even given that, I still don't see why the (scdate,sc_id) index isn't
> perfect for this; it allows the planner to use sc_id for MIN() while
> using scdate to restrict the values. Three values to look up from the
> index-only.
>

Your `sc_id` and `scdate` columns are correlated.

Planner has no such knowledge and assumes columns being independent. Your
`scdate` predicate is
estimate to return 14394 rows (based on the EXPLAIN of your first post). I
think, that this corresponds to
a quite small portion of your table, less than 1% (based on `Rows Removed
by Filter: 4068865` from the
same EXPLAIN). Under uniform distribution, these 14394 rows can be anywhere
in the table.
Therefore, reading min values in the order of your PK is optimal, as you're
expected to hit a rows that
matches given conditions quite soon.

Problem is — your predicate matches a bunch of rows towards the end of the
table, which causes Postgres
to read a big portion of your index before it finds the row that fits.


Right now (9.5 and earlier versions) I do not know of any options that
would not require fixing your queries.


P.S. Maybe `Upper pathification` patch, that is being considered for 9.6,
can deal with such cases.


-- 
Victor Y. Yegorov

Reply via email to