To be precise about tsvector:
1) GiST index is lossy for any kind of tserach queries, GIN index for @@
operation is not lossy, for @@@ - is lossy.
2) Number of positions per word is limited to 256 number - bigger number of
positions is not helpful for ranking, but produces a big tsvector. If word has a
lot of positions in document then it close to be a stopword. We could easy
increase this limit to 65536 positions
3) Maximum value of position is 2^14, because for position's storage we use
uint16. In this integer it's needed to reserve 2 bits to store weight of this
position. It's possible to increase int16 to int32, but it will doubled tsvector
size, which is unpractical, I suppose. So, part of document used for ranking
contains first 16384 words - that is about first 50-100 kilobytes.
4) Limit of total size of tsvector is in WordEntry->pos (ts_type.h) field. It
contains number of bytes between first lexeme in tsvector and needed lexeme.
So, limitation is total length of lexemes plus theirs positional information.
--
Teodor Sigaev E-mail: [EMAIL PROTECTED]
WWW: http://www.sigaev.ru/
--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches