On Tue, 2005-04-26 at 15:00 -0700, Gurmeet Manku wrote: > 2. In a single scan, it is possible to estimate n_distinct by using > a very simple algorithm: > > "Distinct sampling for highly-accurate answers to distinct value > queries and event reports" by Gibbons, VLDB 2001. > > http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf
That looks like the one... ...though it looks like some more complex changes to the current algorithm to use it, and we want the other stats as well... > 3. In fact, Gibbon's basic idea has been extended to "sliding windows" > (this extension is useful in streaming systems like Aurora / Stream): > > "Distributed streams algorithms for sliding windows" > by Gibbons and Tirthapura, SPAA 2002. > > http://home.eng.iastate.edu/~snt/research/tocs.pdf > ...and this offers the possibility of calculating statistics at load time, as part of the COPY command Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])