Hello, Your problem is not about stop words, it's about the fact that tsearch parser treats '+' and '#' symbols as a lexemes of a blank type (use ts_debug() function to figure it out) and drops it without any further processing. AFAIK, typical solution for this is to rewrite your text and then queries to some auxiliary words, like 'SYScpp' and 'SYScsharp', that will be included in tsvectors and indexed without any problems. Usually you can do replacements in tsvector trigger when indexing documents and via query rewriting (in tsearch or your application) when quering database.
Trivial examples: test=# select to_tsvector('english','I know how to code in SYScsharp, java and SYScpp'); to_tsvector ------------------------------------------------------ 'code':5 'java':8 'know':2 'syscpp':10 'syscsharp':7 (1 row) and, sure: test=# select 'I know how to code in SYScsharp, java and SYScpp' @@ 'SYScpp'; ?column? ---------- t (1 row) There might be more sophisticated solution like prevent parser from treating '++' as a blank lexemes, but Oleg will explain this much better, as soon as he has time. -- Regards, Ivan On 5/16/07, Kimball <[EMAIL PROTECTED]> wrote:
postgres=# select to_tsvector('default','I know how to code in C#, java and C++'); to_tsvector ------------------------------------- 'c':7,10 'code':5 'java':8 'know':2 (1 row) postgres=# select to_tsvector('simple','I know how to code in C#, java and C++'); to_tsvector ------------------------------------------------------------------------- 'c':7,10 'i':1 'in':6 'to':4 'and':9 'how':3 'code':5 'java':8 'know':2 (1 row) I'd like to get lexemes/tokens 'c#' and 'c++' out of this query. Everything I can find has to do with stop words. How do I keep characters that tsearch throws out? I've already tried 'c\#' and 'c\\#' etc, which don't work. Kimball
---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org/