Mary Edie Meredith wrote: > I certainly don't claim that it is appropriate to force customers into a > full analysis, particularly if random sampling versus a full scan of the > data reveals little to no performance differences in the plans. Being > able to sample accurately is _very nice for large tables. > > For our testing purposes, however, consistent results are extremely > important. We have observed that small difference in one plan for one of > 22 queries can cause a difference in the DBT-3 results. If this > happens, a small change in performance runs between two Linux kernels > may appear to be due to the kernels, when in fact it is due to the plan > change. > > We know that the plans are _exactly the same if the data in the > pg_statistics table is the same from run to run (all other things being > equal). So what we need to have is identical optimizer costs > (pg_statistics) for the same table data for each. > > I feel certain that the pg_statistics table will be identical from run > to run if analyze looks at every row. Thus our hope to find a way to > get that.
Actually, if you are usig GEQO (many tables in a join) the optimizer itself will randomly try plans --- even worse than random statistics. We do have: #geqo_random_seed = -1 # -1 = use variable seed that lets you force a specific random seed for testing purposes. I wonder if that could be extended to control VACUUM radomization too. Right now, it just controls GEQO and in fact gets reset on every optimizer run. I wonder if you could just poke a srandom(10) in src/backend/command/analyze.c. -- Bruce Momjian | http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])