Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

Mischa Sandberg Tue, 03 May 2005 14:35:42 -0700

Quoting Markus Schaber <[EMAIL PROTECTED]>:

> Hi, Josh,
> 
> Josh Berkus wrote:
> 
> > Yes, actually.   We need 3 different estimation methods:
> > 1 for tables where we can sample a large % of pages (say, >= 0.1)
> > 1 for tables where we sample a small % of pages but are "easily
> estimated"
> > 1 for tables which are not easily estimated by we can't afford to
> sample a 
> > large % of pages.
> > 
> > If we're doing sampling-based estimation, I really don't want
> people to lose 
> > sight of the fact that page-based random sampling is much less
> expensive than 
> > row-based random sampling.   We should really be focusing on
> methods which 
> > are page-based.


Okay, although given the track record of page-based sampling for
n-distinct, it's a bit like looking for your keys under the streetlight,
rather than in the alley where you dropped them :-)

How about applying the distinct-sampling filter on a small extra data
stream to the stats collector? 

-- 
Engineers think equations approximate reality.
Physicists think reality approximates the equations.
Mathematicians never make the connection.


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

Reply via email to