Hi, we have a log table on one server with 1.9 million records.
One column "event" (type text) in that table is a string that (currently) takes a small number of distinct values (~43) (hmm that could have been normalised better). We noted on querying for events of a specific type, that the queries were slower than expected. It simply wasn't using the index (btree, default settings) on this column on this server (the test server, with less records, was fine). Using "ALTER TABLE SET STATISTICS" to increase the number of buckets to 50 resolved the issue, we went pretty much straight there on discovering there are no "HINTS". However we aren't quite sure why this case was pathological, and my brain doesn't grok the documentation quite. I assume that the histogram_bounds for strings are alphabetical in order, so that "DEMOSTART" falls between "DELETE" and "IDEMAIL". Even on a worst case of including both these common values, the planner ought to have assumed that less than <10% of records were likely covered by the value selected, so it seems unlikely to me that not using the index would be a good idea. What am I missing? (and yes there is a plan to upgrade!). => SELECT COUNT(*) FROM log WHERE event='DEMOSTART'; (...lots of time passes...) count ------- 1432 (1 row) => SELECT COUNT(*), event FROM log GROUP BY event ORDER BY count; count | event --------+----------- 6 | DNRFAIL 14 | ADMDNR 14 | UPGRADE 18 | FOCRENEW 21 | AUTOCN 25 | ADMCC 27 | TEMPIN 31 | DNRCANCEL 43 | EXPIRED 128 | DIRECTBUY 130 | CANCEL 130 | CANCELQ 154 | FOCBUY 173 | EXPCCWARN 179 | OFFER 209 | DNROK 214 | TEMPRE 356 | CCWARN 429 | ADMLOGIN 719 | SUBSCRIBE 787 | CCSUCCESS 988 | CCFAILURE 1217 | TEMPNEW 1298 | PAYPAL 1431 | DEMOSTART 1776 | CCREQUEST 2474 | ACCTUPD 15169 | SYSMAINT 42251 | IDEMAIL 46964 | DELETE 50764 | RELOGIN 57022 | NEWUSR 64907 | PUBREC0 65449 | UNPUBLISH 92843 | LOGOUT 99018 | KILLSESS 128900 | UPLOAD 134994 | LOGIN 137608 | NEWPAGE 447556 | PUBREC1 489572 | PUBLISH => EXPLAIN SELECT * FROM log WHERE event='DEMOSTART'; QUERY PLAN ------------------------------------------------------------ Seq Scan on log (cost=0.00..54317.14 rows=20436 width=93) Filter: (event = 'DEMOSTART'::text) (2 rows) => ALTER TABLE log ALTER COLUMN events SET STATISTICS 50; ANALYSE LOG(event); ALTER TABLE ANALYZE => EXPLAIN SELECT COUNT(*) FROM log WHERE event='DEMOSTART'; QUERY PLAN ---------------------------------------------------------------------------- ------- Aggregate (cost=5101.43..5101.43 rows=1 width=0) -> Index Scan using log_event on log (cost=0.00..5098.15 rows=1310 width=0) Index Cond: (event = 'DEMOSTART'::text) (3 rows) => SELECT COUNT(*) FROM log WHERE event='DEMOSTART'; (...almost no time passes...) count ------- 1432 (1 row) BEFORE pajax=> select * from pg_stats where tablename = 'log' and attname='event'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+-----------+---------+-----------+-----------+------------+--------------------------------------------------------+-------------------------------------------------------------------+---------------------------------------------------------------------------------------------+------------- public | log | event | 0 | 10 | 25 | {PUBLISH,PUBREC1,NEWPAGE,UPLOAD,LOGIN,KILLSESS,LOGOUT} | {0.257333,0.248333,0.072,0.0696667,0.0613333,0.0543333,0.0506667} | {ACCTUPD,DELETE,IDEMAIL,NEWUSR,NEWUSR,PUBREC0,PUBREC0,RELOGIN,SYSMAINT,UNPUBLISH,UNPUBLISH} | 0.120881 (1 row) AFTER pajax=> select * from pg_stats where tablename='log' and attname='event'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+-----------+---------+-----------+-----------+------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------- public | log | event | 0 | 10 | 32 | {PUBLISH,PUBREC1,NEWPAGE,LOGIN,UPLOAD,KILLSESS,LOGOUT,PUBREC0,UNPUBLISH,NEWUSR,RELOGIN,DELETE,IDEMAIL} | {0.249067,0.248533,0.0761333,0.0719333,0.0685333,0.0526,0.045,0.0368,0.0348667,0.029,0.0255333,0.0254667,0.0238667} | {ACCTUPD,ACCTUPD,ACCTUPD,ADMLOGIN,CCREQUEST,CCSUCCESS,DEMOSTART,FOCBUY,PAYPAL,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,TEMPNEW,TEMPRE} | 0.106671 (1 row) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance