On 28/05/10 04:47, Tom Lane wrote: > =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <wulc...@wulczer.org> writes: >> On 19/05/10 21:01, Jesper Krogh wrote: >>> In practice, just cranking the statistics estimate up high enough seems >>> to solve the problem, but doesn't >>> there seem to be something wrong in how the statistics are collected? > >> The algorithm to determine most common vals does not do it accurately. >> That would require keeping all lexemes from the analysed tsvectors in >> memory, which would be impractical. If you want to learn more about the >> algorithm being used, try reading >> http://www.vldb.org/conf/2002/S10P03.pdf and corresponding comments in >> ts_typanalyze.c > > I re-scanned that paper and realized that there is indeed something > wrong with the way we are doing it.
> So I think we have to fix this. Hm, I'll try to take another look this evening (CEST). Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers