Re: query_string can't find token that _analyze shows is generated, but term query can

Ivan Brusic Thu, 21 Aug 2014 10:56:17 -0700

One more thing! The match query does not go through the query parser phase.


http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_comparison_to_query_string_field

curl -XPOST "http://localhost:9200/example/example/_search?pretty=true"; -d '
{
  "query": {
    "match": {
      "name": "\"exampleof bug\""
    }
  }
}
'



On Thu, Aug 21, 2014 at 10:49 AM, ben <billumi...@gmail.com> wrote:

> In the ES documentation is talks about escape characters and space is one
> of them. Seems like if you escaped the query with a "\ " it would ignore
> that during the parsing.
>
> Thanks for your help.
>
> On Thursday, August 21, 2014 10:42:32 AM UTC-7, Ivan Brusic wrote:
>
>> In general, if you are using the keyword tokenizer or non analyzed
>> fields, then query string queries should probably not be used. Phrase
>> queries and the keyword tokenizer also do not mix well.
>>
>> Your OR queries succeed because "bug" is a token in your index.
>>
>> --
>> Ivan
>>
>>
>> On Thu, Aug 21, 2014 at 10:26 AM, ben <billu...@gmail.com> wrote:
>>
>>> Any idea why single quotes work?
>>>
>>> This works but doesn't match the lucene query syntax.
>>>
>>> curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
>>> {
>>>   "query": {
>>>     "query_string": {
>>>       "query": "name:''exampleof bug''"
>>>     }
>>>   }
>>> }
>>> '
>>>
>>> On Thursday, August 21, 2014 10:09:29 AM UTC-7, Ivan Brusic wrote:
>>>
>>>> The query string query is a phrase query "\"exampleof bug\""
>>>> The term query is looking for a single token "exampleof bug"
>>>>
>>>> The query parser will not use your tokenizer to parse the phrase. It
>>>> will tokenize based on whitespace and then apply the filters to each term.
>>>> Your index does not contain the token "exampleof" and your analyze API
>>>> example confirms it. The issue of the query parser is a long standing one
>>>> in Lucene.
>>>>
>>>> --
>>>> Ivan
>>>>
>>>>
>>>> On Thu, Aug 21, 2014 at 9:56 AM, ben <billu...@gmail.com> wrote:
>>>>
>>>>> But the query is this...
>>>>>
>>>>> name:"exampleof bug"
>>>>>
>>>>> This should find an exact match in the field name. That exact match
>>>>> token exists.
>>>>>
>>>>> The syntax for lucene under "Fields" section shows a double quote is
>>>>> the correct character for this. http://lucene.apache.org/core/2_9_4/
>>>>> queryparsersyntax.html The term query is found by query_string when
>>>>> using single quotes, but that doesn't match lucene query documentation.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On Thursday, August 21, 2014 9:52:16 AM UTC-7, Ivan Brusic wrote:
>>>>>
>>>>>> I suspect the issue is the way the query parser works. The query
>>>>>> phrase "exampleof bug" will be parsed into a query for the tokens
>>>>>> "exampleof" and "bug" that are adjacent to each other. The issue is that
>>>>>> you do not have two such tokens, instead you have a token with the
>>>>>> value "exampleof bug", which is a single token with a space in it.
>>>>>> According to Lucene, they are not the same thing. You would need to 
>>>>>> create
>>>>>> an analyzer that would create the tokens "exampleof" and "bug".
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Ivan
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 21, 2014 at 8:47 AM, ben <billu...@gmail.com> wrote:
>>>>>>
>>>>>>> Also meant to include this in the script.
>>>>>>>
>>>>>>> echo "query_string query using singe quote which does not match
>>>>>>> lucene query documentation"
>>>>>>> curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
>>>>>>> {
>>>>>>>   "query": {
>>>>>>>     "query_string": {
>>>>>>>       "query": "name:''exampleof bug''"
>>>>>>>     }
>>>>>>>   }
>>>>>>> }
>>>>>>> '
>>>>>>>
>>>>>>> On Thursday, August 21, 2014 8:39:14 AM UTC-7, ben wrote:
>>>>>>>>
>>>>>>>> I have attached a short bash script to recreate the situation. I
>>>>>>>> have a fairly simple custom analyzer that I want to break on camel 
>>>>>>>> case so
>>>>>>>> lowercase is last. Using the _analyze endpoint I can see the token I am
>>>>>>>> searching for is generated by the analyzer, however searching for it 
>>>>>>>> with
>>>>>>>> query_string yields a different result that a term query. I put 
>>>>>>>> comments in
>>>>>>>> the script to explain in more detail.
>>>>>>>>
>>>>>>>> Thanks for any help!
>>>>>>>>
>>>>>>>> #!/bin/sh
>>>>>>>>
>>>>>>>> url="http://localhost:9200";
>>>>>>>> defaultIndex="example"
>>>>>>>>
>>>>>>>> echo "Start over...this will fail the first time the script is run
>>>>>>>> since the index will not exist"
>>>>>>>> curl -XDELETE "$url/$defaultIndex?refresh=true"
>>>>>>>>
>>>>>>>> echo "Create index with custom analyzer"
>>>>>>>> curl -XPUT "$url/$defaultIndex" -d '{
>>>>>>>>  "index": {
>>>>>>>>             "analysis": {
>>>>>>>> "filter": {
>>>>>>>>  "my_worddelim": {
>>>>>>>> "type": "word_delimiter",
>>>>>>>> "split_on_case_change": true,
>>>>>>>>  "preserve_original": true
>>>>>>>> }
>>>>>>>> },
>>>>>>>>                     "analyzer": {
>>>>>>>>  "my_analyzer": {
>>>>>>>> "type":         "custom",
>>>>>>>> "char_filter":  [ "html_strip" ],
>>>>>>>>  "tokenizer":    "keyword",
>>>>>>>> "filter":       [ "stop", "my_worddelim", "lowercase" ]
>>>>>>>>  }
>>>>>>>>                     }
>>>>>>>>             }
>>>>>>>>     }
>>>>>>>> }'
>>>>>>>>
>>>>>>>> echo
>>>>>>>>
>>>>>>>> curl -XPUT "$url/$defaultIndex/example/_mapping" -d '{
>>>>>>>>     "example" : {
>>>>>>>>         "properties" : {
>>>>>>>>             "name": {
>>>>>>>>                 "type" : "multi_field",
>>>>>>>>                 "path": "just_name",
>>>>>>>>                 "fields" : {
>>>>>>>>                     "name": { "type": "string", "analyzer":
>>>>>>>> "my_analyzer" },
>>>>>>>>     "sample" : {"type" : "string", "index" : "not_analyzed" },
>>>>>>>>                     "sample_name" : {"type" : "string", "analyzer":
>>>>>>>> "my_analyzer" }
>>>>>>>>                 }
>>>>>>>>             }
>>>>>>>> }
>>>>>>>>     }
>>>>>>>> }'
>>>>>>>>
>>>>>>>> echo "Shows the lowercase token exampleofbug is generated"
>>>>>>>> curl -XGET "$url/$defaultIndex/_analyze?a
>>>>>>>> nalyzer=my_analyzer&pretty=true" -d 'ExampleOf Bug'
>>>>>>>>
>>>>>>>> echo "Post the document (haven't tried with non-bulk request)"
>>>>>>>> curl -XPOST "$url/$defaultIndex/example/_bulk?refresh=true" -d '
>>>>>>>> { "index" :  {"_index":"example","_type":"
>>>>>>>> example","_id":"2169167","_version_type":"internal","_timestamp":0}
>>>>>>>> }
>>>>>>>> {"name":"ExampleOf Bug"}
>>>>>>>> '
>>>>>>>>
>>>>>>>> echo
>>>>>>>>
>>>>>>>> echo "query_string query is unable to find token in the name field
>>>>>>>> even though the path is just_name. i also tried escaping space per
>>>>>>>> documentation and it fails to parse"
>>>>>>>> curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
>>>>>>>> {
>>>>>>>>   "query": {
>>>>>>>>     "query_string": {
>>>>>>>>       "query": "name:\"exampleof bug\""
>>>>>>>>     }
>>>>>>>>   }
>>>>>>>> }
>>>>>>>> '
>>>>>>>>
>>>>>>>> echo
>>>>>>>>
>>>>>>>> echo "Can successfully find token in name field that I was unable
>>>>>>>> to find with query_string"
>>>>>>>> curl -XPOST "$url/$defaultIndex/example/_search?pretty=true" -d '
>>>>>>>> {
>>>>>>>>   "query": {
>>>>>>>>     "term": {
>>>>>>>>       "name": "exampleof bug"
>>>>>>>>     }
>>>>>>>>   }
>>>>>>>> }
>>>>>>>> '
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "elasticsearch" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce
>>>>>>> 5-4272-8b80-1f148e96f8ae%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/fb920c7a-dce5-4272-8b80-1f148e96f8ae%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to elasticsearc...@googlegroups.com.
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40goo
>>>>> glegroups.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/f167607d-32da-497e-ba09-bb77cfd0784e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/8614174f-f450-4981-9af6-8956033a9378%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/8614174f-f450-4981-9af6-8956033a9378%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5d4e0ad0-cea8-46fa-8c1a-ebacf602f0af%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/5d4e0ad0-cea8-46fa-8c1a-ebacf602f0af%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAjYAVZLYZcD_-1G7VqU2vL2zt-XqyjPYvpeJGvEX7WZg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: query_string can't find token that _analyze shows is generated, but term query can

Reply via email to