[ 
https://issues.apache.org/jira/browse/SOLR-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335436#comment-14335436
 ] 

Arun Rangarajan commented on SOLR-7154:
---------------------------------------

I had initially done this on Solr 4.2.1. After seeing your comment, I tried the 
same on Solr 5.0.0 and it gives the same results.

> Wildcard query matches special characters
> -----------------------------------------
>
>                 Key: SOLR-7154
>                 URL: https://issues.apache.org/jira/browse/SOLR-7154
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Arun Rangarajan
>            Priority: Minor
>
> I have a string field raw_name defined like this:
> {code}
> <fieldType name="string" class="solr.StrField" sortMissingLast="true" 
> omitNorms="true"/>
> ...
> <field name="raw_name" type="string" indexed="true" stored="true" />
> {code}
> I have a document like this:
> {code}
> {raw_name: beyoncé}
> {code}
> Notice that the last character is a special character (accented e).
> When I issue this wildcard query:
> {code}
> q=raw_name:beyonce*
> {code}
> i.e. with the last character simply being the ASCII 'e', Solr returns me the 
> above document.
> Exact query:
> {code}
> /select?q=raw_name:beyonce*&wt=json&fl=raw_name
> {code}
> Response:
> {code}
> {
>   "responseHeader": {
>     "status": 0,
>     "QTime": 0,
>     "params": {
>       "fl": "raw_name",
>       "q": "raw_name:beyonce*",
>       "wt": "json"
>     }
>   },
>   "response": {
>     "numFound": 2,
>     "start": 0,
>     "docs": [
>       {
>         "raw_name": "beyoncé"
>       },
>       {
>         "raw_name": "beyoncé"
>       }
>     ]
>   }
> }
> {code}
> I used the analysis tool in Solr admin (with Jetty). The raw bytes look like 
> this:
> Raw bytes for beyonce: [62 65 79 6f 6e 63 65]
> Raw bytes for beyoncé: [62 65 79 6f 6e 63 65 cc 81]
> So when you look at the bytes, it seems to explain why beyonce* might match 
> beyoncé.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to