Sabbiolina,

you have two options:

1. Write you very own parser
2. Write dictionary, which breaks host to parts

Fortunately, you can use our dict_regex dictionary
(http://vo.astronet.ru/arxiv/dict_regex.html) instead of 2.

Oleg

On Wed, 18 Jun 2008, Sabbiolina wrote:

Hello,



I've seen that the default parser for the full-text search can identify
e-mail addresses, hosts, URLs? but I have a serious problem with it:



Suppose I index the following sentence "the search engine I use the most is
www.google.com"



And I search "google" no result is found.

Instead if I search "www.google.com" the record is found correctly.



I guess the reason is because the parser treats www.google.com as a single
token (of type 'host') but as everyone can easily see the result of this is
a major problem. In fact the word "google" actually is in the above
sentence, and the end-user of the database obviously asks me "why does your
FTS not find that record when I can clearly see that my search term is
there?"



Reading the docs I've seen that the parser can produce multiple tokens for
the same word (for example the word "make-up" produces 4 tokens: make-up,
make, -, up)? why not doing the same with URLs and e-mails? Why
www.google.com is only treated as a unique word? Why not producing multiple
tokens like www.google.com, www, ., google, ., com? (obviously www and . can
be nulled or stopworded).


Does anybody know of a better parser for Postgres? Or at least a trick to
make its FTS find the record above by searching only a part of the URL?


        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Reply via email to