Re: preserve special characters

2013-06-18 Thread Mingfeng Yang
Hi Jack,

That seems like the solution I am looking for. Thanks so much!

//Can't find this "types" for WDF anywhere.

Ming-


On Tue, Jun 18, 2013 at 4:52 PM, Jack Krupansky wrote:

> The WDF has a "types" attribute which can specify one or more character
> type mapping files. You could create a file like:
>
> @ => ALPHA
> _ => ALPHA
>
> For example (from the book!):
>
> Example - Treat at-sign and underscores as text
>
>   positionIncrementGap="100" autoGeneratePhraseQueries="**true">
>
>  
>types="at-under-alpha.txt"/>
>
>  
>
> The file +at-under-alpha.txt+ would contain:
>
>  @ => ALPHA
>  _ => ALPHA
>
> The analysis results:
>
>Source: Hello @World_bar, r@end.
>Tokens: 1: Hello 2: @World_bar 3: r@end
>
>
> -- Jack Krupansky
>
> -Original Message- From: Mingfeng Yang
> Sent: Tuesday, June 18, 2013 6:58 PM
> To: solr-user@lucene.apache.org
> Subject: preserve special characters
>
>
> We need to index and search lots of tweets which can like "@solr:  solr is
> great". or "@solr_lucene, good combination".
>
> And we want to search with "@solr" or "@solr_lucene".  How can we preserve
> "@" and "_" in the index?
>
> If using whitespacetokennizer followed by worddelimiterfilter, @solr_lucene
> will be broken down into "solr" and "lucene", which make the search results
> contain lots of non-relevant docs.
>
> If using standardtokenizer, the "@" symbol is stripped.
>
> Thanks,
> Ming-
>


Re: preserve special characters

2013-06-18 Thread Jack Krupansky
The WDF has a "types" attribute which can specify one or more character type 
mapping files. You could create a file like:


@ => ALPHA
_ => ALPHA

For example (from the book!):

Example - Treat at-sign and underscores as text

 
   
 
 
   
 

The file +at-under-alpha.txt+ would contain:

 @ => ALPHA
 _ => ALPHA

The analysis results:

   Source: Hello @World_bar, r@end.
   Tokens: 1: Hello 2: @World_bar 3: r@end


-- Jack Krupansky

-Original Message- 
From: Mingfeng Yang

Sent: Tuesday, June 18, 2013 6:58 PM
To: solr-user@lucene.apache.org
Subject: preserve special characters

We need to index and search lots of tweets which can like "@solr:  solr is
great". or "@solr_lucene, good combination".

And we want to search with "@solr" or "@solr_lucene".  How can we preserve
"@" and "_" in the index?

If using whitespacetokennizer followed by worddelimiterfilter, @solr_lucene
will be broken down into "solr" and "lucene", which make the search results
contain lots of non-relevant docs.

If using standardtokenizer, the "@" symbol is stripped.

Thanks,
Ming- 



Re: preserve special characters

2013-06-18 Thread Learner
You can use keyword tokenizer..

Creates org.apache.lucene.analysis.core.KeywordTokenizer.

Treats the entire field as a single token, regardless of its content.

Example: "http://example.com/I-am+example?Text=-Hello"; ==>
"http://example.com/I-am+example?Text=-Hello";



--
View this message in context: 
http://lucene.472066.n3.nabble.com/preserve-special-characters-tp4071488p4071496.html
Sent from the Solr - User mailing list archive at Nabble.com.


preserve special characters

2013-06-18 Thread Mingfeng Yang
We need to index and search lots of tweets which can like "@solr:  solr is
great". or "@solr_lucene, good combination".

And we want to search with "@solr" or "@solr_lucene".  How can we preserve
"@" and "_" in the index?

If using whitespacetokennizer followed by worddelimiterfilter, @solr_lucene
will be broken down into "solr" and "lucene", which make the search results
contain lots of non-relevant docs.

If using standardtokenizer, the "@" symbol is stripped.

Thanks,
Ming-