Greg, > I'm convinced these two are more connected than you believe.
Actually, I think they are inseparable. > I might be interested in implementing that algorithm that was posted a > while back that involved generating good unbiased samples of discrete > values. The algorithm was quite clever and well described and paper > claimed impressively good results. > > However it will only make sense if people are willing to accept that > analyze will need a full table scan -- at least for tables where the DBA > knows that good n_distinct estimates are necessary. What about block-based sampling? Sampling 1 in 20 disk pages, rather than 1 in 20 rows, should require siginificantly less scanning, and yet give us a large enough sample for reasonable accuracy. > > 3. We don't have any method to analyze inter-column correlation within > > a table; > > > > 4. We don't keep statistics on foriegn key correlation; > > Gosh these would be nice but they sound like hard problems. Has anybody > even made any headway in brainstorming how to tackle them? There's no time like the present! Actually, these both seem like fairly straightforward problems storage-wise. The issue is deriving the statistics, for tables with many columns or FKs. > > 5. random_page_cost (as previously discussed) is actually a funciton > > of relatively immutable hardware statistics, and as such should not > > need to exist as a GUC once the cost model is fixed. > > I don't think that's true at all. Not all hardware is the same. > > Certainly the need to twiddle this GUC should be drastically reduced if > the cache effects are modelled properly and the only excuses left are > legitimate hardware differences. OK ... but still, it should become a "little knob" rather than the "big knob" it is currently. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly