On Sat, Mar 25, 2006 at 05:38:26PM +0000, Simon Riggs wrote: > On Sat, 2006-03-25 at 16:24 +0100, Martijn van Oosterhout wrote: > > > I agree. However, if it's the overhead of calling gettimeofday() that > > slows everything down, perhaps we should tackle that end. For example, > > have a sampling mode that only times say 5% of the executed nodes. > > > > EXPLAIN ANALYZE SAMPLE blah; > > I like this idea. Why not do this all the time? I'd say we don't need > the SAMPLE clause at all, just do this for all EXPLAIN ANALYZEs.
I was wondering about that. But then you may run into wierd results if a subselect takes a long time for just a few value. But maybe it should be the default, and have a FULL mode to say you want to measure everything. > Something even simpler? First 40 plus 5% random sample after that? I'd > prefer a random sample so we have the highest level of trust in the > numbers produced. Otherwise we might accidentally introduce bias from > systematic effects such as nested loops queries speeding up towards the > end of their run. (I know we would do that at the start, but we are > stuck because we don't know the population size ahead of time and we > know we need a reasonable number of data points). Well, I was wondering if a fixed percentage was appropriate. 5% of 10 million is still a lot for possibly not a lot of benefit. The followup email suggested a sampling that keeps happening less often as the number of tuples increases it a logorithmic based way. But we could add dome randomness that'd be cool. The question is, what's the overhead of calling random()? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
signature.asc
Description: Digital signature