Re: [HACKERS] Can pg_trgm handle non-alphanumeric characters?

Fujii Masao Fri, 11 May 2012 08:54:34 -0700

On Fri, May 11, 2012 at 4:11 AM, Tom Lane <[email protected]> wrote:
> Fujii Masao <[email protected]> writes:
>> On Fri, May 11, 2012 at 12:07 AM, MauMau <[email protected]> wrote:
>>> Thanks for your explanation. Although I haven't understood it well yet, I'll
>>> consider what you taught. And I'll consider if the tentative measure of
>>> removing KEEPONLYALNUM is correct for someone who wants to use pg_trgm
>>> against Japanese text.
>
>> In Japanese, it's common to do a text search with two characters keyword.
>> But since pg_trgm is 3-gram, you basically would not be able to use index
>> for such text search. So you might need something like pg_bigm or pg_unigm
>> for Japanese text search.


Even if an index can be used for two characters text search, bitmap index scan
picks up all rows, so it's too slow.

> I believe the trigrams are three *bytes* not three characters.  So a
> couple of kanji should work just fine for this.

Really? As far as I read the code of pg_trgm, the trigram is three characters
and its CRC32 is used as an index key if its size is more than three bytes.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Can pg_trgm handle non-alphanumeric characters?

Reply via email to