I'm not using the standard analyzer, I'm using a pattern that will break 
the text on all non-word characters, like this:

"analyzer": {
    "letterordigit": {
        "type": "pattern",
        "pattern": "[^\\p{L}\\p{N}]+"
    }
}


I have verified that the message field is being broke up into the tokens I 
expect (example in my first post).

So when I run a search for message:welcome-doesnotmatch, I'm expecting that 
string to be broken into tokens like so:

welcome
doesnotmatch

And for the search to therefore find 0 documents. But it doesn't -- it 
finds 1 document, the document that contains my sample message, which does 
not include the token "doesnotmatch".

So why on Earth would this search match that document? It is behaving as if 
everything after the "-" is completely ignored. It does not matter what I 
put there, it will still match the document.

This is coming up because an end user is searching for a hyphenated word, 
like "battle-axe", and it's matching a document that does not contain the 
word "axe" at all.



On Friday, November 7, 2014 12:24:30 AM UTC-8, Jun Ohtani wrote:
>
> Hi Dave,
>
> I think the reason is your "message" field using "standard analyzer".
> Standard analyzer divide text by "-".
> If you change analyzer to whitespace analyzer, it matches 0 documents.
>
> _validate API is useful for checking exact query.
> Example request: 
>
> curl -XGET "/YOUR_INDEX/_validate/query?explain" -d'
> {
>   "query": {
>     "query_string": {
>       "query": "id:3955974 AND message:welcome-doesnotmatchanything"
>     }
>   }
> }'
>
> You can get the following response. In this example, "message" field is 
> "index": "not_analyzed".
> {
>    "valid": true,
>    "_shards": {
>       "total": 1,
>       "successful": 1,
>       "failed": 0
>    },
>    "explanations": [
>       {
>          "index": "YOUR_INDEX,
>          "valid": true,
>          "explanation": "+id:3955974 +message:welcome-doesnotmatchanything"
>       }
>    ]
> }
>
>
> See: 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-validate.html#search-validate
>
> I hope that those help you out.
>
> Regards,
> Jun
>
>
> 2014-11-07 9:47 GMT+09:00 Dave Reed <infin...@gmail.com <javascript:>>:
>
>> I have a document with a field "message", that contains the following 
>> text (truncated):
>>
>> Welcome to test.com!
>>
>> The assertion field is mapped to have an analyzer that breaks that string 
>> into the following tokens:
>>
>> welcome
>> to
>> test
>> com
>>
>> But, when I search with a query like this:
>>
>> {
>>   "query": {
>>
>>     "query_string": {
>>       "query": "id:3955974 AND message:welcome-doesnotmatchanything"
>>     }
>>   }
>> }
>>
>>
>>
>> To my surprise, it finds the document (3955974 is the document id). The 
>> dash and everything after it seems to be ignored, because it does not 
>> matter what I put there, it will still match the document.
>>
>> I've tried escaping it:
>>
>> {
>>   "query": {
>>     "query_string": {
>>       "query": "id:3955974 AND message:welcome\\-doesnotmatchanything"
>>     }
>>   }
>> }
>> (note the double escape since it has to be escaped for the JSON too)
>>
>> But that makes no difference. I still get 1 matching document. If I put 
>> it in quotes it works:
>>
>> {
>>   "query": {
>>     "query_string": {
>>       "query": "id:3955974 AND message:\"welcome-doesnotmatchanything\""
>>     }
>>   }
>> }
>>
>> It works, meaning it matches 0 documents, since that document does not 
>> contain the "doesnotmatchanything" token. That's great, but I don't 
>> understand why the unquoted version does not work. This query is being 
>> generated so I can't easily just decide to start quoting it, and I can't 
>> always do that anyway since the user is sometimes going to use wildcards, 
>> which can't be quoted if I want them to function. I was under the 
>> assumption that an EscapedUnquotedString is the same as a quoted unespaced 
>> string (in other words, foo:a\b\c === foo:"abc", assuming all special 
>> characters are escaped in the unquoted version).
>>
>> I'm only on ES 1.01, but I don't see anything new or changes that would 
>> have impacted this behavior in later versions.
>>
>> Any insights would be helpful! :)
>>
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> -----------------------
> Jun Ohtani
> blog : http://blog.johtani.info
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/26a1cf96-b89b-4729-a2b1-58ba79c425a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to