Re: [GENERAL] full text search: the concept of a "word"

2006-04-20 Thread Teodor Sigaev

My textfields are trigger-generated using information from a number of
tables: these fields can be, say, a couple of thousand characters
wide.
Up to here, there's no problem.
What I'd like to do is define - possibly using regexps - what
constitutes a word. For instance, my word separator is a semicolon,
not a space; a dash is not a separator, and neither are language
specific characters (which might be interpreted that way by a language
agnostic tool)...
BTW, I use UTF-8 as my database encoding if it's of any importance.


I do not see a big problem: just write your own parser.

It's may be a problem with UTF-8: only CHS head tsearch2 supports UTF-8. But you 
can find a patch on 8.1 at http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/





--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] full text search: the concept of a "word"

2006-04-20 Thread Tomi NA
I'm considering using tsearch2 in the project I'm working on right
now...however, I'm not sure if tsearch2 can handle my very specific
requirements - I therefore hope someone can tell me if the following
is possible and how I should go about it...

My textfields are trigger-generated using information from a number of
tables: these fields can be, say, a couple of thousand characters
wide.
Up to here, there's no problem.
What I'd like to do is define - possibly using regexps - what
constitutes a word. For instance, my word separator is a semicolon,
not a space; a dash is not a separator, and neither are language
specific characters (which might be interpreted that way by a language
agnostic tool)...
BTW, I use UTF-8 as my database encoding if it's of any importance.

What it comes down to is this: is it possible to somehow define what
constitutes a word?

TIA,
Tomislav

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings