This one looks *really* good. http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf
It does require a single full table scan but it works in O(n) time and constant space and it guarantees the confidence intervals for the estimates it provides like the histograms do for regular range scans. It can even keep enough data to provide estimates for n_distinct when unrelated predicates are applied. I'm not sure Postgres would want to do this though; this seems like it's part of the cross-column correlation story more than the n_distinct story. It seems to require keeping an entire copy of the sampled record in the stats tables which would be prohibitive quickly in wide tables (it would be O(n^2) storage in the number of columns) . It also seems like a lot of work to implement. Nothing particular that would be impossible, but it does require storing a moderately complex data structure. Perhaps Postgres's new support for data structures will make this easier. -- greg ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly