Hi, Situation:
We have a table with 3,500,000+ rows, which contain items that need to be printed or have been printed previously. Most of these records have a status of 'PRINTED', we have a partial index on this table WHERE status <> 'PRINTED'. During normal operation there will be < 10 records matching 'NOT_YET_PRINTED'. When using the index scan this is done in < 5ms, but when the sequential scan is involved the query runs > 500ms. We query this table often in the form: SELECT * FROM print_list JOIN [...] JOIN [...] WHERE stats = 'NOT_YET_PRINTED' LIMIT 8; This query is currently switching between a sequential scan on the print_list table and an index scan on the previously mentioned index. When doing an explain analyze on the queries we see that it sometimes expects to return > 5000 records when in reality it is only < 5 records that are returned, for example: -> Index Scan using print_list_status_idx on print_list (cost=0.27..1138.53 rows=6073 width=56) (actual time=0.727..0.727 rows=0 loops=1) Sometimes, this results in the planner choosing a sequential scan for this query. When analyzing pg_stats we have sometimes have the following: (Note: 'NOT_YET_PRINTED' has not been found during this analyze, these are real values) attname | status inherited | f null_frac | 0 avg_width | 4 n_distinct | 3 most_common_vals | {PRINTED} most_common_freqs | {0.996567} histogram_bounds | {PREPARED,ERROR} correlation | 0.980644 A question about this specific entry, which some of you may be able to shed some light on: most_common_vals contains only 1 entry, why is this? I would expect to see 3 entries, as it has n_distinct=3 When looking at http://www.postgresql.org/docs/current/static/row-estimation-examples.html we can see that an estimate > 5000 is what is to be expected for these statistics: # select ( (1 - 0.996567)/2 * 3500000 )::int; int4 ------ 6008 (1 row) But why does it not record the frequency of 'PREPARED' and 'ERROR' in most_common_*? Our current strategies in mitigating this problem is decreasing the autovacuum_*_scale_factor for this specific table, therefore triggering more analyses and vacuums. This is helping somewhat, as if the problem occurs it often solved automatically if autoanalyze analyzes this table, it is analyzed many times an hour currently. We can also increase the 'Stats target' for this table, which will cause the statistics to contain information about 'NOT_YET_PRINTED' more often, but even then, it may not find any of these records, as they sometimes do not exist. Could you help us to find a strategy to troubleshoot this issue further? Some specific questions: - We can see it is doing a sequential scan of the full table (3.5mio records) even when it only expects 8000 records to be returned, we would expect this not to happen so soon. - Why is most_common_* not filled when there are only 3 distinct values? Feike Steenbergen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance