Re: [HACKERS] tsvector pg_stats seems quite a bit off.

Tom Lane Sat, 29 May 2010 09:39:00 -0700

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[email protected]> writes:
> [ e of ] s/2 or s/3 look reasonable.


The examples in the LC paper seem to all use e = s/10.  Note the stated
assumption e << s.

> So, should I just write a patch that sets the bucket width and pruning
> count using 0.07 as the assumed frequency of the most common word and
> epsilon equal to s/2 or s/3?

I'd go with s = 0.07 / desired-MCE-count and e = s / 10, at least for
a first cut to experiment with.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

Reply via email to