Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

Jan Urbański Thu, 10 Jul 2008 14:27:36 -0700

Tom Lane wrote:

The way I think it ought to work is that the number of lexemes stored in
the final pg_statistic entry is statistics_target times a constant
(perhaps 100).  I don't like having it vary depending on tsvector width

I think the existing code puts at most statistics_target elements in apg_statistic tuple. In compute_minimal_stats() num_mcv starts withstats->attr->attstattarget and is adjusted only downwards.My original thought was to keep that property for tsvectors (i.e. storeat most statistics_target lexemes) and advise people to set it high fortheir tsvector columns (e.g. 100x their default).Also, the existing code decides which elements are worth storing as mostcommon ones by discarding those that are not frequent enough (that'swhere num_mcv can get adjusted downwards). I mimicked that for lexemesbut maybe it just doesn't make sense?

But in any case, given a target number of lexemes to accumulate,
I'd suggest pruning with that number as the bucket width (pruning
distance).   Or perhaps use some multiple of the target number, but

the number itself seems about right.

Fine with me, I'm too tired to do the math now, so I'll take your wordfor it :)


Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

Reply via email to