Re: [GENERAL] tsearch2 keep throw-away characters

Ivan Zolotukhin Sat, 19 May 2007 22:39:09 -0700

Hello,

Your problem is not about stop words, it's about the fact that tsearch
parser treats '+' and '#' symbols as a lexemes of a blank type (use
ts_debug() function to figure it out) and drops it without any further
processing. AFAIK, typical solution for this is to rewrite your text
and then queries to some auxiliary words, like 'SYScpp' and
'SYScsharp', that will be included in tsvectors and indexed without
any problems. Usually you can do replacements in tsvector trigger when
indexing documents and via query rewriting (in tsearch or your
application) when quering database.


Trivial examples:

test=# select to_tsvector('english','I know how to code in SYScsharp,
java and SYScpp');
                    to_tsvector
------------------------------------------------------
'code':5 'java':8 'know':2 'syscpp':10 'syscsharp':7
(1 row)

and, sure:

test=# select 'I know how to code in SYScsharp, java and SYScpp' @@ 'SYScpp';
?column?
----------
t
(1 row)

There might be more sophisticated solution like prevent parser from
treating '++' as a blank lexemes, but Oleg will explain this much
better, as soon as he has time.

--
Regards,
Ivan


On 5/16/07, Kimball <[EMAIL PROTECTED]> wrote:


postgres=# select to_tsvector('default','I know how to code in C#, java and
C++');
              to_tsvector
-------------------------------------
 'c':7,10 'code':5 'java':8 'know':2
 (1 row)

postgres=# select to_tsvector('simple','I know how to code in C#, java and
C++');
                               to_tsvector
-------------------------------------------------------------------------
 'c':7,10 'i':1 'in':6 'to':4 'and':9 'how':3 'code':5 'java':8 'know':2
(1 row)


I'd like to get lexemes/tokens 'c#' and 'c++' out of this query.  Everything
I can find has to do with stop words.   How do I keep characters that
tsearch throws out?  I've already tried 'c\#' and 'c\\#' etc, which don't
work.

Kimball


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org/

Re: [GENERAL] tsearch2 keep throw-away characters

Reply via email to