On Mon, Nov 19, 2012 at 10:05 AM, Alexander Korotkov <aekorot...@gmail.com>wrote:
> On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao <masao.fu...@gmail.com>wrote: > >> Note that we cannot do a partial-match if KEEPONLYALNUM is disabled, >> i.e., if query key contains multibyte characters. In this case, byte >> length of >> the trigram string might be larger than three, and its CRC is used as a >> trigram key instead of the trigram string itself. Because of using CRC, we >> cannot do a partial-match. Attached patch extends pg_trgm so that it >> compares a partial-match query key only when KEEPONLYALNUM is >> enabled. >> > > Didn't get this point. How does KEEPONLYALNUM guarantee that each trigram > character is singlebyte? > > CREATE TABLE test (val TEXT); > INSERT INTO test VALUES ('aa'), ('aaa'), ('шaaш'); > CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops); > ANALYZE test; > test=# SELECT * FROM test WHERE val LIKE '%aa%'; > val > ------ > aa > aaa > шaaш > (3 rows) > test=# set enable_seqscan = off; > SET > test=# SELECT * FROM test WHERE val LIKE '%aa%'; > val > ----- > aa > aaa > (2 rows) > > I think we can use partial match only for singlebyte encodings. Or, at > most, in cases when all alpha-numeric characters are singlebyte (have no > idea how to check this). > Actually, I also was fiddling around idea of partial match on trigrams when I was working on initial LIKE patch. But, I concluded that we would need a separate opclass which always keeps full trigram in entry. ------ With best regards, Alexander Korotkov.