[
https://issues.apache.org/jira/browse/SOLR-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335452#comment-14335452
]
Arun Rangarajan commented on SOLR-7154:
---------------------------------------
Right, that seems to be the issue. I added another document with "Latin small
letter with acute" and that document does not match the wild-card query. So I
think this needs to be fixed in my data source itself.
> Wildcard query matches special characters
> -----------------------------------------
>
> Key: SOLR-7154
> URL: https://issues.apache.org/jira/browse/SOLR-7154
> Project: Solr
> Issue Type: Bug
> Reporter: Arun Rangarajan
> Priority: Minor
>
> I have a string field raw_name defined like this:
> {code}
> <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="true"/>
> ...
> <field name="raw_name" type="string" indexed="true" stored="true" />
> {code}
> I have a document like this:
> {code}
> {raw_name: beyoncé}
> {code}
> Notice that the last character is a special character (accented e).
> When I issue this wildcard query:
> {code}
> q=raw_name:beyonce*
> {code}
> i.e. with the last character simply being the ASCII 'e', Solr returns me the
> above document.
> Exact query:
> {code}
> /select?q=raw_name:beyonce*&wt=json&fl=raw_name
> {code}
> Response:
> {code}
> {
> "responseHeader": {
> "status": 0,
> "QTime": 0,
> "params": {
> "fl": "raw_name",
> "q": "raw_name:beyonce*",
> "wt": "json"
> }
> },
> "response": {
> "numFound": 2,
> "start": 0,
> "docs": [
> {
> "raw_name": "beyoncé"
> },
> {
> "raw_name": "beyoncé"
> }
> ]
> }
> }
> {code}
> I used the analysis tool in Solr admin (with Jetty). The raw bytes look like
> this:
> Raw bytes for beyonce: [62 65 79 6f 6e 63 65]
> Raw bytes for beyoncé: [62 65 79 6f 6e 63 65 cc 81]
> So when you look at the bytes, it seems to explain why beyonce* might match
> beyoncé.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]