On Wed, May 2, 2012 at 9:35 AM, Alexander Korotkov <aekorot...@gmail.com> wrote: >> I was thinking you could perhaps do it just based on the *number* of >> trigrams, not necessarily their frequency. > > Imagine we've two queries: > 1) SELECT * FROM tbl WHERE col LIKE '%abcd%'; > 2) SELECT * FROM tbl WHERE col LIKE '%abcdefghijk%'; > > The first query require reading posting lists of trigrams "abc" and "bcd". > The second query require reading posting lists of trigrams "abc", "bcd", > "cde", "def", "efg", "fgh", "ghi", "hij" and "ijk". > We could decide to use index scan for first query and sequential scan for > second query because number of posting list to read is high. But it is > unreasonable because actually second query is narrower than the first one. > We can use same index scan for it, recheck will remove all false positives. > When number of trigrams is high we can just exclude some of them from index > scan. It would be better than just decide to do sequential scan. But the > question is what trigrams to exclude? Ideally we would leave most rare > trigrams to make index scan cheaper.
True. I guess I was thinking more of the case where you've got abc|def|ghi|jkl|mno|pqr|stu|vwx|yza|.... There's probably some point at which it becomes silly to think about using the index. >> > Probably you have some comments on idea of conversion from pg_wchar to >> > multibyte? Is it acceptable at all? >> >> Well, I'm not an expert on encodings, but it seems like a logical >> extension of what we're doing right now, so I don't really see why >> not. I'm confused by the diff hunks in pg_mule2wchar_with_len, >> though. Presumably either the old code is right (in which case, don't >> change it) or the new code is right (in which case, there's a bug fix >> needed here that ought to be discussed and committed separately from >> the rest of the patch). Maybe I am missing something. > > Unfortunately I didn't understand original logic of pg_mule2wchar_with_len. > I just did proposal about how it could be. I hope somebody more familiar > with this code would clarify this situation. Well, do you think the current code is buggy, or not? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers