On Fri, May 11, 2012 at 4:11 AM, Tom Lane <[email protected]> wrote: > Fujii Masao <[email protected]> writes: >> On Fri, May 11, 2012 at 12:07 AM, MauMau <[email protected]> wrote: >>> Thanks for your explanation. Although I haven't understood it well yet, I'll >>> consider what you taught. And I'll consider if the tentative measure of >>> removing KEEPONLYALNUM is correct for someone who wants to use pg_trgm >>> against Japanese text. > >> In Japanese, it's common to do a text search with two characters keyword. >> But since pg_trgm is 3-gram, you basically would not be able to use index >> for such text search. So you might need something like pg_bigm or pg_unigm >> for Japanese text search.
Even if an index can be used for two characters text search, bitmap index scan picks up all rows, so it's too slow. > I believe the trigrams are three *bytes* not three characters. So a > couple of kanji should work just fine for this. Really? As far as I read the code of pg_trgm, the trigram is three characters and its CRC32 is used as an index key if its size is more than three bytes. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
