Search special chars

2012-07-23 Thread Li, Qiang
Hi All,

I want to search some keywords like Non-taxable, which has a - in the word. 
Can I make it working in Solr by some configuration? Or any other ways?

Thanks  Regards,
Ivan

This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message. Local registered entity information: 
http://www.msci.com/legal/local_registered_entities.html


Re: Search special chars

2012-07-23 Thread Lance Norskog
The Whitespace Tokenizer does this. It breaks everything apart only
by space, tabs and newlines. You can use this whitespace tokenizer in
the query stack of your field type.

Another option is to create a regular expression CharFilter that turns
non-* into non*.

On Mon, Jul 23, 2012 at 7:10 PM, Li, Qiang qiang...@msci.com wrote:
 Hi All,

 I want to search some keywords like Non-taxable, which has a - in the 
 word. Can I make it working in Solr by some configuration? Or any other ways?

 Thanks  Regards,
 Ivan

 This email message and any attachments are for the sole use of the intended 
 recipients and may contain proprietary and/or confidential information which 
 may be privileged or otherwise protected from disclosure. Any unauthorized 
 review, use, disclosure or distribution is prohibited. If you are not an 
 intended recipient, please contact the sender by reply email and destroy the 
 original message and any copies of the message as well as any attachments to 
 the original message. Local registered entity information: 
 http://www.msci.com/legal/local_registered_entities.html



-- 
Lance Norskog
goks...@gmail.com


Re: Search special chars

2012-07-23 Thread Ram Marpaka
Ivan

The hyphen character (-) is a Solr operator to exclude results matching the 
word that follows the operator. You may strip off them while indexing and 
searching. I think there are different ways to make it work if you need to 
retain. I am using the following way 


1. Excerpt from my schema.xml (you may not need all filters):


fieldtype name=text class=solr.TextField positionIncrementGap=100

            analyzer type=index

                tokenizer class=solr.WhitespaceTokenizerFactory/
                !-- in this example, we will only use synonyms at query time
                filter class=solr.SynonymFilterFactory 
synonyms=index_synonyms.txt ignoreCase=true expand=false/
                --
                filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
                filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0/
                filter class=solr.LowerCaseFilterFactory/
                filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
                filter class=solr.RemoveDuplicatesTokenFilterFactory/
            /analyzer
            analyzer type=query
tokenizer class=solr.LowerCaseTokenizerFactory/--
             !--   tokenizer class=solr.WhitespaceTokenizerFactory/--
                filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/
                filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
                filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0/
                filter class=solr.LowerCaseFilterFactory/
                filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
                filter class=solr.RemoveDuplicatesTokenFilterFactory/
            /analyzer
/fieldtype
    

2. Query:
Iam removing hyphens before appending to q= query string which is working fine 
for me

http://localhost:9080/solr/custdatacore/select/?q=PHONE_NUMBER:239083*


Note: Since my field is text and this is num, i am just appending * at the end 

The actual data stored in index is 111-123-9083

and the spell check with below (without stripping off hyphens)
suggest/?spellcheck.q=111-123-9083 spellcheck=true


 

Thanks
Ram M Marpaka



 From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Monday, 23 July 2012 7:14 PM
Subject: Re: Search special chars
 
The Whitespace Tokenizer does this. It breaks everything
 apart only
by space, tabs and newlines. You can use this whitespace tokenizer in
the query stack of your field type.

Another option is to create a regular expression CharFilter that turns
non-* into non*.

On Mon, Jul 23, 2012 at 7:10 PM, Li, Qiang qiang...@msci.com wrote:
 Hi All,

 I want to search some keywords like Non-taxable, which has a - in the 
 word. Can I make it working in Solr by some configuration? Or any other ways?

 Thanks  Regards,
 Ivan

 This email message and any attachments are for the sole use of the intended 
 recipients and may contain proprietary and/or confidential information which 
 may be privileged or otherwise protected from disclosure. Any unauthorized 
 review, use, disclosure or distribution is prohibited. If you are not an 
 intended recipient, please contact
 the sender by reply email and destroy the original message and any copies of 
the message as well as any attachments to the original message. Local 
registered entity information: 
http://www.msci.com/legal/local_registered_entities.html



-- 
Lance Norskog
goks...@gmail.com