On Wed, Sep 27, 2006 at 12:30:43PM +0100, Simon Riggs wrote: > On Tue, 2006-09-26 at 21:27 -0500, Jim C. Nasby wrote: > > Since I don't recall any ideas ever having been thrown out on how to do > > this... > > > > ISTM that we could gain additional insight on how many rows would likely > > result from a join > > One thing we can do is to use cross-column relationships to improve the > estimation of Ndistinct. > > If we have a table > Order_line (PK orderId, lineNum) > > If we look at lineNum and see it has on average 10 values we can then > use this information to compute that Ndistinct should be -0.1, i.e. the > number of values is proportional to the number of rows with a factor of > 10. > > Right now if there are more than 10 lineNums per orderId on average then > we never decide that orderId is a scalable statistic. > > I propose adding a final step to ANALYZE that applies a cross-column > rule after all columns have been analysed. If all except one column of a > PK have very low Ndistinct we can use that to calculate a minimum number > of Ndistinct for the column with a high number of values. If that > minimum number is less than the Ndistinct estimate in isolation, then we > overlay the new value. > > This is desirable because the estimation of Ndistinct is very sensitive > to the number of matching rows in the sample, so Ndistinct estimates are > usually very poor for large Ndistinct. The estimates for low Ndistinct > are much better, so we can use them with a lower standard error to > correct the in-isolation estimate of other columns.
But wouldn't overlaying the value screw us if we wanted to look up something based on the unique field? (ie: if there was a line_id in order_line and we wanted to look something up based on line_id). -- Jim Nasby [EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster