[ https://issues.apache.org/jira/browse/SOLR-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335452#comment-14335452 ]
Arun Rangarajan commented on SOLR-7154: --------------------------------------- Right, that seems to be the issue. I added another document with "Latin small letter with acute" and that document does not match the wild-card query. So I think this needs to be fixed in my data source itself. > Wildcard query matches special characters > ----------------------------------------- > > Key: SOLR-7154 > URL: https://issues.apache.org/jira/browse/SOLR-7154 > Project: Solr > Issue Type: Bug > Reporter: Arun Rangarajan > Priority: Minor > > I have a string field raw_name defined like this: > {code} > <fieldType name="string" class="solr.StrField" sortMissingLast="true" > omitNorms="true"/> > ... > <field name="raw_name" type="string" indexed="true" stored="true" /> > {code} > I have a document like this: > {code} > {raw_name: beyoncé} > {code} > Notice that the last character is a special character (accented e). > When I issue this wildcard query: > {code} > q=raw_name:beyonce* > {code} > i.e. with the last character simply being the ASCII 'e', Solr returns me the > above document. > Exact query: > {code} > /select?q=raw_name:beyonce*&wt=json&fl=raw_name > {code} > Response: > {code} > { > "responseHeader": { > "status": 0, > "QTime": 0, > "params": { > "fl": "raw_name", > "q": "raw_name:beyonce*", > "wt": "json" > } > }, > "response": { > "numFound": 2, > "start": 0, > "docs": [ > { > "raw_name": "beyoncé" > }, > { > "raw_name": "beyoncé" > } > ] > } > } > {code} > I used the analysis tool in Solr admin (with Jetty). The raw bytes look like > this: > Raw bytes for beyonce: [62 65 79 6f 6e 63 65] > Raw bytes for beyoncé: [62 65 79 6f 6e 63 65 cc 81] > So when you look at the bytes, it seems to explain why beyonce* might match > beyoncé. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org