"Marinos J. Yannikos" <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> No, they are not that easy to determine. In particular I think the idea >> of automatically feeding back error measurements is hopeless, because >> you cannot tell which parameters are wrong.
> Isn't it just a matter of solving an equation system with n variables (n > being the number of parameters), where each equation stands for the > calculation of the run time of a particular query? If we knew all the variables involved, it might be (though since the equations would be nonlinear, the solution would be more difficult than you suppose). The real problems are: 1. There is lots of noise in any real-world measurement, mostly due to competition from other processes. 2. There are effects we don't even try to model, such as the current contents of kernel cache. Everybody who's done any work with Postgres knows that for small-to-middling tables, running the same query twice in a row will yield considerably different runtimes, because the second time through all the data will be in kernel cache. But we don't have any useful way to model that in the optimizer, since we can't see what the kernel has in its buffers. 3. Even for the effects we do try to model, some of the equations are pretty ad-hoc and might not fit real data very well. (I have little confidence in the current correction for index order correlation, for example.) In short, if you just try to fit the present cost equations to real data, what you'll get will inevitably be "garbage in, garbage out". You could easily end up with parameter values that are much less realistic than the defaults. Over time we'll doubtless improve the optimizer's cost models, and someday we might get to a point where this wouldn't be a fool's errand, but I don't see it happening in the foreseeable future. I think a more profitable approach is to set up special test code to try to approximate the value of individual parameters measured in isolation. For instance, the current default of 4.0 for random_page_cost was developed through rather extensive testing a few years ago, and I think it's still a decent average value (for the case where you are actually doing I/O, mind you). But if your disks have particularly fast or slow seek times, maybe it's not good for you. It might be useful to package up a test program that repeats those measurements on particular systems --- though the problem of noisy measurements still applies. It is not easy or cheap to get a measurement that isn't skewed by kernel caching behavior. (You need a test file significantly larger than RAM, and even then you'd better repeat the measurement quite a few times to see how much noise there is in it.) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org