On Fri, May 11, 2012 at 4:11 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Fujii Masao <masao.fu...@gmail.com> writes: >> On Fri, May 11, 2012 at 12:07 AM, MauMau <maumau...@gmail.com> wrote: >>> Thanks for your explanation. Although I haven't understood it well yet, I'll >>> consider what you taught. And I'll consider if the tentative measure of >>> removing KEEPONLYALNUM is correct for someone who wants to use pg_trgm >>> against Japanese text. > >> In Japanese, it's common to do a text search with two characters keyword. >> But since pg_trgm is 3-gram, you basically would not be able to use index >> for such text search. So you might need something like pg_bigm or pg_unigm >> for Japanese text search.
Even if an index can be used for two characters text search, bitmap index scan picks up all rows, so it's too slow. > I believe the trigrams are three *bytes* not three characters. So a > couple of kanji should work just fine for this. Really? As far as I read the code of pg_trgm, the trigram is three characters and its CRC32 is used as an index key if its size is more than three bytes. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers