Rod Taylor <[EMAIL PROTECTED]> writes: > On Tue, 2005-04-26 at 19:03 -0400, Greg Stark wrote: > > This one looks *really* good. > > > > http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf > > > > It does require a single full table scan > > Ack.. Not by default please. > > I have a few large append-only tables (vacuum isn't necessary) which do > need stats rebuilt periodically.
The algorithm can also naturally be implemented incrementally. Which would be nice for your append-only tables. But that's not Postgres's current philosophy with statistics. Perhaps some trigger function that you could install yourself to update statistics for a newly inserted record would be useful. The paper is pretty straightforward and easy to read, but here's an executive summary: The goal is to gather a uniform sample of *distinct values* in the table as opposed to a sample of records. Instead of using a fixed percentage sampling rate for each record, use a hash of the value to determine whether to include it. At first include everything, but if the sample space overflows throw out half the values based on their hash value. Repeat until finished. In the end you'll have a sample of 1/2^n of your distinct values from your entire data set where n is large enough for you sample to fit in your predetermined constant sample space. -- greg ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster