On Wed, 2007-12-05 at 15:13 -0500, Chris Browne wrote: > I have the theory (thus far not borne out by any numbers) that it > might be a useful approach to try to go through the DB schema and use > what information is there to try to come up with better numbers on a > per-column basis.
Yeh, agreed. The difficulty is making this work for generic datatypes. > - Datestamps tend to imply temporal dispersion, ergo "somewhat fewer > bins." Similar for floats. Hmmm, not sure about that one. Some date/time columns can change very quickly over time, so the stats are frequently out of date. > Then could come a "second order" perspective, where data would > actually get sampled from pg_statistics. > > - If we look at the number of distinct histogram bins used, for a > particular column, and find that there are some not used, we might > drop bins. The histograms are height balanced, so they are always all used. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq