Arun Rangarajan created SOLR-7154: ------------------------------------- Summary: Wildcard query matches special characters Key: SOLR-7154 URL: https://issues.apache.org/jira/browse/SOLR-7154 Project: Solr Issue Type: Bug Reporter: Arun Rangarajan Priority: Minor
I have a string field raw_name defined like this: {code} <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> ... <field name="raw_name" type="string" indexed="true" stored="true" /> {code} I have a document like this: {code} {raw_name: beyoncé} {code} Notice that the last character is a special character (accented e). When I issue this wildcard query: {code} q=raw_name:beyonce* {code} i.e. with the last character simply being the ASCII 'e', Solr returns me the above document. Exact query: {code} /select?q=raw_name:beyonce*&wt=json&fl=raw_name {code} Response: {code} { "responseHeader": { "status": 0, "QTime": 0, "params": { "fl": "raw_name", "q": "raw_name:beyonce*", "wt": "json" } }, "response": { "numFound": 2, "start": 0, "docs": [ { "raw_name": "beyoncé" }, { "raw_name": "beyoncé" } ] } } {code} I used the analysis tool in Solr admin (with Jetty). The raw bytes look like this: Raw bytes for beyonce: [62 65 79 6f 6e 63 65] Raw bytes for beyoncé: [62 65 79 6f 6e 63 65 cc 81] So when you look at the bytes, it seems to explain why beyonce* might match beyoncé. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org