On Tue, May 1, 2012 at 6:02 PM, Alexander Korotkov <[email protected]> wrote: > Right. When number of trigrams is big, it is slow to scan posting list of > all of them. The solution is this case is to exclude most frequent trigrams > from index scan. But, it require some kind of statistics of trigrams > frequencies which we don't have. We could estimate frequencies using some > hard-coded assumptions about natural languages. Or we could exclude > arbitrary trigrams. But I don't like both these ideas. This problem is also > relevant for LIKE/ILIKE search using trigram indexes.
I was thinking you could perhaps do it just based on the *number* of trigrams, not necessarily their frequency. > Probably you have some comments on idea of conversion from pg_wchar to > multibyte? Is it acceptable at all? Well, I'm not an expert on encodings, but it seems like a logical extension of what we're doing right now, so I don't really see why not. I'm confused by the diff hunks in pg_mule2wchar_with_len, though. Presumably either the old code is right (in which case, don't change it) or the new code is right (in which case, there's a bug fix needed here that ought to be discussed and committed separately from the rest of the patch). Maybe I am missing something. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
