AW: [HACKERS] analyze.c

Zeugswetter Andreas SB Mon, 16 Oct 2000 05:38:34 -0700


> > I've been reading something about implementation of histograms, and,
> > AFAIK, in practice histograms is just a cool name for no more than:
> >    1. top ten with frequency for each
> >    2. the same for top ten worse
> >    3. average for the rest

Consider, that we only need that info for choice of index, and if an average value was 
too
frequent for this index to be efficient you can safely drop the index, it would be 
useless.
Thus it seems to me that keeping stats on the most infrequent values (point 2) is 
useless.
For me these would also be the most volatile, thus the stats would only be
accurate for a short period of time.

I think what we need is as follows:
1. our current histograms 
2. a list of exceptions for exceptional values that are very frequent
 
Exceptional are those values that would skew the distribution too much.

Very infrequent values should not be used for min|max values of histogram buckets,
but that is imho all that needs to be done for infrequent values.

Andreas

AW: [HACKERS] analyze.c

Reply via email to