On Fri, Dec 12, 2008 at 3:59 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Simon Riggs <si...@2ndquadrant.com> writes: >> On Fri, 2008-12-12 at 13:20 -0500, Tom Lane wrote: >>> Simon Riggs <si...@2ndquadrant.com> writes: >>>> Solutions can also include >>>> * manual mechanism for setting ndistinct that doesn't keep getting >>>> overwritten by subsequent ANALYZEs >>> >>> Hmm, that might actually be the most practical answer for large, >>> reasonably-static tables. Especially if we expose the "negative >>> stadistinct" feature to let people specify it as a fraction of table >>> size. > >> Works for me. Especially if you want to think more about ANALYZE before >> changing that. > > Well, it's something that would be sane to contemplate adding in 8.4. > It's way too late for any of this other stuff to happen in this release.
I'm thinking about trying to implement this, unless someone else is already planning to do it. I'm not sure it's practical to think about getting this into 8.4 at this point, but it's worth doing whether it does or not. The main problem I see here is that I don't know what the mechanism should be. My first thought was to use a reloption, but that won't work because reloptions are per-table and this setting is per-column. So my second thought is to add a column to pg_attribute and add a new DDL commands to modify it, maybe something like: ALTER TABLE name ALTER [COLUMN] column SET NDISTINCT 1672; ALTER TABLE name ALTER [COLUMN] column DROP NDISTINCT; Another option would be to invent a reloption-like syntax for columns: ALTER TABLE name ALTER [COLUMN] column SET (ndistinct = 1672); ALTER TABLE name ALTER [COLUMN] column RESET (ndistinct); This sort of seems cleaner and I can imagine it being useful for other purposes, but I'm not sure if I want to go to the trouble of building a completely general column-level reloptions mechanism ATM. It's also not entirely clear to me whether this option, wherever we put it, should be directly examined by the selectivity functions before consulting the statistics tuple, or whether it should merely override the value that ANALYZE puts into the statistics tuple. The latter seems simpler and more performant to me, but does lead to the possibly surprising result that changes don't take effect until the next ANALYZE. Thoughts? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers