On 03/09/2015 02:54 PM, Alvaro Herrera wrote:
Beena Emerson wrote:
In the pg_trgm module, within function generate_trgm, the memory for trigrams
is allocated as follows:

trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) *3);

I have been trying to understand why this is so because it seems to be
allocating more space than that is required.

Maybe it's considering a worst-case for multibyte characteres?  I don't
really know if trgm supports multibyte, but I assume it does.  If it
does, then probably the trigrams consist of chars, not bytes.

Nope. Trigrams are always three bytes, even ones containing multibyte characters. If there are any multibyte characters in the trigram, we store a 3-byte checksum of the three characters instead. That loses some information, you can have a collision where one multibyte trigram incorrectly matches another one, but the trigram algorithms are generally not too concerned about exact results.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to