Re: [HACKERS] Term positions in GIN fulltext index

Yoann Moreau Fri, 04 Nov 2011 03:15:47 -0700

On 03/11/11 19:19, Florian Pflug wrote:

There's a difference between values of type tsvector, and what GIN indices
on columns or expressions of type tsvector store.

I was wondering what was the point about storing the tsvector in thetable, I now understand. I then should use the GIN index to rank mydocuments, and work on the stored tsvectors for positions.

As I pointed out above, you'll first need to make sure to store the result of
to_tsvector in a columns. Then, what you need seems to be a functions that
takes a tsvector value and returns the contained lexems as individual rows.

Postgres doesn't seem to contain such a function currently (don't believe that,
though - go and recheck the documentation. I don't know all thousands of 
built-in
functions by heart). But it's easy to add one. You could either use PL/pgSQL
to parse the tsvector's textual representation, or write a C function. If you
go the PL/pgSQL route, regexp_split_to_table() might come in handy.

This seems easier to program than what I was thinking about, I'm goingto do that. But I'm wondering about size of database with the GIN indexplus the tsvector column, and performance about parsing the wholetsvectors for each document I need positions from (as I need them for avery few terms).

Maybe some external fulltext engine managing lexemes and positions wouldbe more efficient for my purpose. I'll try some different things and letyou know the results.


Thanks all for your help
Regards,
Yoann Moreau


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Term positions in GIN fulltext index

Reply via email to