Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

Jan Urbański Thu, 10 Jul 2008 23:19:20 -0700

Tom Lane wrote:

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

Well, (1) the normal measure would be statistics_target *tsvectors*,
and we'd have to translate that to lexemes somehow; my proposal is just
to use a fixed constant instead of tsvector width as in your original
patch.  And (2) storing only statistics_target lexemes would be
uselessly small and would guarantee that people *have to* set a custom
target on tsvector columns to get useful results.  Obviously broken
defaults are not my bag.


Fair enough, I'm fine with a multiplication factor.

Also, the existing code decides which elements are worth storing as mostcommon ones by discarding those that are not frequent enough (that'swhere num_mcv can get adjusted downwards). I mimicked that for lexemesbut maybe it just doesn't make sense?
Well, that's not unreasonable either, if you can come up with a
reasonable definition of "not frequent enough"; but that adds another
variable to the discussion.

The current definition was "with more occurrences than 0.001 of totalrows count, but no less than 2". Copied right offcompute_minimal_stats(), I have no problem with removing it. I think itspoint is to guard you against a situation where all elements are more orless unique, and taking the top N would just give you some random noise.It doesn't hurt, so I'd be for keeping the mechanism, but if people feeldifferent, then I'll just drop it.


--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

Reply via email to