Re: Query_string search containing a dash has unexpected results

2014-11-11 Thread joergpra...@gmail.com
If you want to translate battle-axe into "battle axe", note that the
correct method would be to introduce a phrase search with slop 0. The and
operator may also work in most cases but the word positions will be lost,
you get an more unprecise search for docs that contain "battle" and "axe"
anywhere in the field.

Jörg

On Tue, Nov 11, 2014 at 1:27 AM, Dave Reed  wrote:

> Yes, and this was the key, thank you so much. But see my reply above about
> the docs on that param being confusing. That was really the source of the
> problem for me.
>
>
> On Monday, November 10, 2014 4:15:05 PM UTC-8, Amish Asthana wrote:
>>
>> No I am not saying that . I am saying this :
>> GET  my_index_v1/mytype/_search
>> {
>>   "query": {
>> "query_string": {
>>   "default_field": "name",
>>   "query": "welcome-doesnotmatchanything",
>>   "default_operator": "AND"
>> }
>>   }
>> }
>>
>> Here I will not get a match as expected. If I do not specify then OR is
>> the deafult operator and it will match.
>> amish
>>
>>
>> On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:
>>>
>>> My default operator doesn't matter if I understand it correctly, because
>>> I'm specifying the operate explicitly. Also, I can reproduce this behavior
>>> using a single search term, so there's no operator to speak of. Unless
>>> you're  saying that the default operator applies to a single term query if
>>> it is broken into tokens?
>>>
>>>
 Note that using the welcome-doesnotmatchanything analzyzer will break
 into two tokens with OR and your document will match unless you use AND
>>>
>>>
>>> This concerns me... my search looks like:
>>>
>>> message:welcome-doesnotmatchanything
>>>
>>> I cannot break that into an AND. The entire thing is a value provided by
>>> the end user. You're saying I should on the app side break the string they
>>> entered into tokens and join them with ANDs? That doesn't seem viable...
>>>
>>> Let me back up and say what I'm expecting the user to be able to do.
>>> There's a single text box where they can enter a search query, with the
>>> following rules:
>>> 1. The user may use a trailing wildcard, e.g. foo*
>>> 2. The user may enter multiple terms separated by a space. Only
>>> documents containing all of the terms will match.
>>> 3. The user might enter special characters, such as in "battle-axe",
>>> simply because that is what they think they should search for, which should
>>> match documents containing "battle" and "axe" (the same as a search for
>>> "battle axe").
>>>
>>> To that end, I am taking their search string and forming a search like
>>> this:
>>>
>>> message: AND...
>>>
>>> Where the string is split on spaces and joined with the AND clauses. For
>>> each individual part of the search phrase, I take care of escaping special
>>> characters (except "*" since I am allowing them to use wildcards). For
>>> example, if they entered "foo bar!", I would generate this query:
>>>
>>> message:foo AND message:bar\!
>>>
>>> The problem is they are entering "battle-axe", causing me to generate
>>> this:
>>>
>>> message:battle\-axe
>>>
>>> But that ends up being the same as:
>>>
>>> (message:battle OR message:axe)
>>>
>>> I guess that is what I was not expecting. Because of this behavior, I
>>> have to know from my app point of view what tokens I should be splitting
>>> the original string on, so that I can join them back together with ANDs.
>>> But that means basically reimplementing the tokenizer on my end, does it
>>> not? There must be a better way? Like specifying I want those terms to be
>>> joined with ANDs instead?
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEwS3ZGs540HcpBipfa__Q8fjPRVkrrHCt0KXJpKn3a2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed
Yes, and this was the key, thank you so much. But see my reply above about 
the docs on that param being confusing. That was really the source of the 
problem for me.

On Monday, November 10, 2014 4:15:05 PM UTC-8, Amish Asthana wrote:
>
> No I am not saying that . I am saying this :
> GET  my_index_v1/mytype/_search
> {
>   "query": {
> "query_string": {
>   "default_field": "name",
>   "query": "welcome-doesnotmatchanything",
>   "default_operator": "AND"
> }
>   }
> }
>
> Here I will not get a match as expected. If I do not specify then OR is 
> the deafult operator and it will match.
> amish
>
>
> On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:
>>
>> My default operator doesn't matter if I understand it correctly, because 
>> I'm specifying the operate explicitly. Also, I can reproduce this behavior 
>> using a single search term, so there's no operator to speak of. Unless 
>> you're  saying that the default operator applies to a single term query if 
>> it is broken into tokens?
>>  
>>
>>> Note that using the welcome-doesnotmatchanything analzyzer will break 
>>> into two tokens with OR and your document will match unless you use AND
>>
>>
>> This concerns me... my search looks like:
>>
>> message:welcome-doesnotmatchanything
>>
>> I cannot break that into an AND. The entire thing is a value provided by 
>> the end user. You're saying I should on the app side break the string they 
>> entered into tokens and join them with ANDs? That doesn't seem viable...
>>
>> Let me back up and say what I'm expecting the user to be able to do. 
>> There's a single text box where they can enter a search query, with the 
>> following rules:
>> 1. The user may use a trailing wildcard, e.g. foo*
>> 2. The user may enter multiple terms separated by a space. Only documents 
>> containing all of the terms will match.
>> 3. The user might enter special characters, such as in "battle-axe", 
>> simply because that is what they think they should search for, which should 
>> match documents containing "battle" and "axe" (the same as a search for 
>> "battle axe").
>>
>> To that end, I am taking their search string and forming a search like 
>> this:
>>
>> message: AND...
>>
>> Where the string is split on spaces and joined with the AND clauses. For 
>> each individual part of the search phrase, I take care of escaping special 
>> characters (except "*" since I am allowing them to use wildcards). For 
>> example, if they entered "foo bar!", I would generate this query:
>>
>> message:foo AND message:bar\!
>>
>> The problem is they are entering "battle-axe", causing me to generate 
>> this:
>>
>> message:battle\-axe
>>
>> But that ends up being the same as:
>>
>> (message:battle OR message:axe)
>>
>> I guess that is what I was not expecting. Because of this behavior, I 
>> have to know from my app point of view what tokens I should be splitting 
>> the original string on, so that I can join them back together with ANDs. 
>> But that means basically reimplementing the tokenizer on my end, does it 
>> not? There must be a better way? Like specifying I want those terms to be 
>> joined with ANDs instead?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Amish Asthana
No I am not saying that . I am saying this :
GET  my_index_v1/mytype/_search
{
  "query": {
"query_string": {
  "default_field": "name",
  "query": "welcome-doesnotmatchanything",
  "default_operator": "AND"
}
  }
}

Here I will not get a match as expected. If I do not specify then OR is the 
deafult operator and it will match.
amish


On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:
>
> My default operator doesn't matter if I understand it correctly, because 
> I'm specifying the operate explicitly. Also, I can reproduce this behavior 
> using a single search term, so there's no operator to speak of. Unless 
> you're  saying that the default operator applies to a single term query if 
> it is broken into tokens?
>  
>
>> Note that using the welcome-doesnotmatchanything analzyzer will break 
>> into two tokens with OR and your document will match unless you use AND
>
>
> This concerns me... my search looks like:
>
> message:welcome-doesnotmatchanything
>
> I cannot break that into an AND. The entire thing is a value provided by 
> the end user. You're saying I should on the app side break the string they 
> entered into tokens and join them with ANDs? That doesn't seem viable...
>
> Let me back up and say what I'm expecting the user to be able to do. 
> There's a single text box where they can enter a search query, with the 
> following rules:
> 1. The user may use a trailing wildcard, e.g. foo*
> 2. The user may enter multiple terms separated by a space. Only documents 
> containing all of the terms will match.
> 3. The user might enter special characters, such as in "battle-axe", 
> simply because that is what they think they should search for, which should 
> match documents containing "battle" and "axe" (the same as a search for 
> "battle axe").
>
> To that end, I am taking their search string and forming a search like 
> this:
>
> message: AND...
>
> Where the string is split on spaces and joined with the AND clauses. For 
> each individual part of the search phrase, I take care of escaping special 
> characters (except "*" since I am allowing them to use wildcards). For 
> example, if they entered "foo bar!", I would generate this query:
>
> message:foo AND message:bar\!
>
> The problem is they are entering "battle-axe", causing me to generate this:
>
> message:battle\-axe
>
> But that ends up being the same as:
>
> (message:battle OR message:axe)
>
> I guess that is what I was not expecting. Because of this behavior, I have 
> to know from my app point of view what tokens I should be splitting the 
> original string on, so that I can join them back together with ANDs. But 
> that means basically reimplementing the tokenizer on my end, does it not? 
> There must be a better way? Like specifying I want those terms to be joined 
> with ANDs instead?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b20d4b80-2ebd-4b5c-a1e5-a434c2d68598%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed
Ok... specifying default_operator: AND worked

In that case, I'd like to say that the docs on that option are incomplete 
or confusing. It says:

The default operator used if no explicit operator is specified. For example, 
with a default operator of OR, the query capital of Hungary is translated 
to capital OR of OR Hungary, and with default operator of AND, the same 
query is translated to capital AND of AND Hungary. The default value is OR.

That's all well and good, but my query does not have multiple terms like 
that. I have a single term for a single field. The default operator is 
applying to the resulting tokens of that, after they are generated by the 
analyzer. I assumed that the default operator applied at the level of the 
query being parsed and that had nothing at all to do with the analyzer. 
Making that clearer could have saved me a lot of time :)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1a058ca-b179-495a-8b82-e65fece4f99f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed
My default operator doesn't matter if I understand it correctly, because 
I'm specifying the operate explicitly. Also, I can reproduce this behavior 
using a single search term, so there's no operator to speak of. Unless 
you're  saying that the default operator applies to a single term query if 
it is broken into tokens?
 

> Note that using the welcome-doesnotmatchanything analzyzer will break 
> into two tokens with OR and your document will match unless you use AND


This concerns me... my search looks like:

message:welcome-doesnotmatchanything

I cannot break that into an AND. The entire thing is a value provided by 
the end user. You're saying I should on the app side break the string they 
entered into tokens and join them with ANDs? That doesn't seem viable...

Let me back up and say what I'm expecting the user to be able to do. 
There's a single text box where they can enter a search query, with the 
following rules:
1. The user may use a trailing wildcard, e.g. foo*
2. The user may enter multiple terms separated by a space. Only documents 
containing all of the terms will match.
3. The user might enter special characters, such as in "battle-axe", simply 
because that is what they think they should search for, which should match 
documents containing "battle" and "axe" (the same as a search for "battle 
axe").

To that end, I am taking their search string and forming a search like this:

message: AND...

Where the string is split on spaces and joined with the AND clauses. For 
each individual part of the search phrase, I take care of escaping special 
characters (except "*" since I am allowing them to use wildcards). For 
example, if they entered "foo bar!", I would generate this query:

message:foo AND message:bar\!

The problem is they are entering "battle-axe", causing me to generate this:

message:battle\-axe

But that ends up being the same as:

(message:battle OR message:axe)

I guess that is what I was not expecting. Because of this behavior, I have 
to know from my app point of view what tokens I should be splitting the 
original string on, so that I can join them back together with ANDs. But 
that means basically reimplementing the tokenizer on my end, does it not? 
There must be a better way? Like specifying I want those terms to be joined 
with ANDs instead?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/924a04d5-4163-41b5-a7e7-e3ca2982d078%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Amish Asthana
I created a test index using your pattern and I am seeing the appropriate 
behaviour.
I am assuming you are using the same analyzer for search/query as well as 
ensuring that your DEFAULT OPERATOR is AND.
Note that using the welcome-doesnotmatchanything analzyzer will break into 
two tokens with OR and your document will match unless you use AND.
amish

On Monday, November 10, 2014 2:48:06 PM UTC-8, Dave Reed wrote:
>
> Also interesting... if I run the query with explain=true, I see 
> information in the details about the "welcome" token, but there's no 
> mention at all about the "doesnotmatch" token. I guess it wouldn't mention 
> it though, since if it did, the document shouldn't match in the first place.
>
> On Monday, November 10, 2014 2:45:05 PM UTC-8, Dave Reed wrote:
>>
>> Yes of course :) Here we go:
>>
>> {
>>
>>- valid: true
>>- _shards: {
>>   - total: 1
>>   - successful: 1
>>   - failed: 0
>>}
>>- explanations: [
>>   - {
>>  - index: index_v1
>>  - valid: true
>>  - explanation: message:welcome message:doesnotmatch
>>   }
>>]
>>
>> }
>>
>> It pasted a little weird but that's it.
>>
>>
>>
>> On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:
>>>
>>> Can you run the validate query output. That will be helpful.
>>> amish
>>>



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6f17d388-83c9-4d75-8f6f-8af3b4dc954b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed
Also interesting... if I run the query with explain=true, I see information 
in the details about the "welcome" token, but there's no mention at all 
about the "doesnotmatch" token. I guess it wouldn't mention it though, 
since if it did, the document shouldn't match in the first place.

On Monday, November 10, 2014 2:45:05 PM UTC-8, Dave Reed wrote:
>
> Yes of course :) Here we go:
>
> {
>
>- valid: true
>- _shards: {
>   - total: 1
>   - successful: 1
>   - failed: 0
>}
>- explanations: [
>   - {
>  - index: index_v1
>  - valid: true
>  - explanation: message:welcome message:doesnotmatch
>   }
>]
>
> }
>
> It pasted a little weird but that's it.
>
>
>
> On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:
>>
>> Can you run the validate query output. That will be helpful.
>> amish
>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/632d1e74-31a0-42f2-ad09-40e3030449d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed
Yes of course :) Here we go:

{
   
   - valid: true
   - _shards: {
  - total: 1
  - successful: 1
  - failed: 0
   }
   - explanations: [
  - {
 - index: index_v1
 - valid: true
 - explanation: message:welcome message:doesnotmatch
  }
   ]

}

It pasted a little weird but that's it.



On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:
>
> Can you run the validate query output. That will be helpful.
> amish
>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/83422fed-2e1c-4e27-825e-5bd9f334f85a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Amish Asthana
Can you run the validate query output. That will be helpful.
amish

On Thursday, November 6, 2014 4:47:12 PM UTC-8, Dave Reed wrote:
>
> I have a document with a field "message", that contains the following text 
> (truncated):
>
> Welcome to test.com!
>
> The assertion field is mapped to have an analyzer that breaks that string 
> into the following tokens:
>
> welcome
> to
> test
> com
>
> But, when I search with a query like this:
>
> {
>   "query": {
>
> "query_string": {
>   "query": "id:3955974 AND message:welcome-doesnotmatchanything"
> }
>   }
> }
>
>
>
> To my surprise, it finds the document (3955974 is the document id). The 
> dash and everything after it seems to be ignored, because it does not 
> matter what I put there, it will still match the document.
>
> I've tried escaping it:
>
> {
>   "query": {
> "query_string": {
>   "query": "id:3955974 AND message:welcome\\-doesnotmatchanything"
> }
>   }
> }
> (note the double escape since it has to be escaped for the JSON too)
>
> But that makes no difference. I still get 1 matching document. If I put it 
> in quotes it works:
>
> {
>   "query": {
> "query_string": {
>   "query": "id:3955974 AND message:\"welcome-doesnotmatchanything\""
> }
>   }
> }
>
> It works, meaning it matches 0 documents, since that document does not 
> contain the "doesnotmatchanything" token. That's great, but I don't 
> understand why the unquoted version does not work. This query is being 
> generated so I can't easily just decide to start quoting it, and I can't 
> always do that anyway since the user is sometimes going to use wildcards, 
> which can't be quoted if I want them to function. I was under the 
> assumption that an EscapedUnquotedString is the same as a quoted unespaced 
> string (in other words, foo:a\b\c === foo:"abc", assuming all special 
> characters are escaped in the unquoted version).
>
> I'm only on ES 1.01, but I don't see anything new or changes that would 
> have impacted this behavior in later versions.
>
> Any insights would be helpful! :)
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7790c6fc-5578-4434-9bd2-fd846e59a997%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed
I'm not using the standard analyzer, I'm using a pattern that will break 
the text on all non-word characters, like this:

"analyzer": {
"letterordigit": {
"type": "pattern",
"pattern": "[^\\p{L}\\p{N}]+"
}
}


I have verified that the message field is being broke up into the tokens I 
expect (example in my first post).

So when I run a search for message:welcome-doesnotmatch, I'm expecting that 
string to be broken into tokens like so:

welcome
doesnotmatch

And for the search to therefore find 0 documents. But it doesn't -- it 
finds 1 document, the document that contains my sample message, which does 
not include the token "doesnotmatch".

So why on Earth would this search match that document? It is behaving as if 
everything after the "-" is completely ignored. It does not matter what I 
put there, it will still match the document.

This is coming up because an end user is searching for a hyphenated word, 
like "battle-axe", and it's matching a document that does not contain the 
word "axe" at all.



On Friday, November 7, 2014 12:24:30 AM UTC-8, Jun Ohtani wrote:
>
> Hi Dave,
>
> I think the reason is your "message" field using "standard analyzer".
> Standard analyzer divide text by "-".
> If you change analyzer to whitespace analyzer, it matches 0 documents.
>
> _validate API is useful for checking exact query.
> Example request: 
>
> curl -XGET "/YOUR_INDEX/_validate/query?explain" -d'
> {
>   "query": {
> "query_string": {
>   "query": "id:3955974 AND message:welcome-doesnotmatchanything"
> }
>   }
> }'
>
> You can get the following response. In this example, "message" field is 
> "index": "not_analyzed".
> {
>"valid": true,
>"_shards": {
>   "total": 1,
>   "successful": 1,
>   "failed": 0
>},
>"explanations": [
>   {
>  "index": "YOUR_INDEX,
>  "valid": true,
>  "explanation": "+id:3955974 +message:welcome-doesnotmatchanything"
>   }
>]
> }
>
>
> See: 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-validate.html#search-validate
>
> I hope that those help you out.
>
> Regards,
> Jun
>
>
> 2014-11-07 9:47 GMT+09:00 Dave Reed >:
>
>> I have a document with a field "message", that contains the following 
>> text (truncated):
>>
>> Welcome to test.com!
>>
>> The assertion field is mapped to have an analyzer that breaks that string 
>> into the following tokens:
>>
>> welcome
>> to
>> test
>> com
>>
>> But, when I search with a query like this:
>>
>> {
>>   "query": {
>>
>> "query_string": {
>>   "query": "id:3955974 AND message:welcome-doesnotmatchanything"
>> }
>>   }
>> }
>>
>>
>>
>> To my surprise, it finds the document (3955974 is the document id). The 
>> dash and everything after it seems to be ignored, because it does not 
>> matter what I put there, it will still match the document.
>>
>> I've tried escaping it:
>>
>> {
>>   "query": {
>> "query_string": {
>>   "query": "id:3955974 AND message:welcome\\-doesnotmatchanything"
>> }
>>   }
>> }
>> (note the double escape since it has to be escaped for the JSON too)
>>
>> But that makes no difference. I still get 1 matching document. If I put 
>> it in quotes it works:
>>
>> {
>>   "query": {
>> "query_string": {
>>   "query": "id:3955974 AND message:\"welcome-doesnotmatchanything\""
>> }
>>   }
>> }
>>
>> It works, meaning it matches 0 documents, since that document does not 
>> contain the "doesnotmatchanything" token. That's great, but I don't 
>> understand why the unquoted version does not work. This query is being 
>> generated so I can't easily just decide to start quoting it, and I can't 
>> always do that anyway since the user is sometimes going to use wildcards, 
>> which can't be quoted if I want them to function. I was under the 
>> assumption that an EscapedUnquotedString is the same as a quoted unespaced 
>> string (in other words, foo:a\b\c === foo:"abc", assuming all special 
>> characters are escaped in the unquoted version).
>>
>> I'm only on ES 1.01, but I don't see anything new or changes that would 
>> have impacted this behavior in later versions.
>>
>> Any insights would be helpful! :)
>>
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> ---
> Jun Ohtani
> blog : http://blog.johtani.info
>  

-- 
You received this message because you are subscribed to the Google Groups

Re: Query_string search containing a dash has unexpected results

2014-11-07 Thread Jun Ohtani
Hi Dave,

I think the reason is your "message" field using "standard analyzer".
Standard analyzer divide text by "-".
If you change analyzer to whitespace analyzer, it matches 0 documents.

_validate API is useful for checking exact query.
Example request:

curl -XGET "/YOUR_INDEX/_validate/query?explain" -d'
{
  "query": {
"query_string": {
  "query": "id:3955974 AND message:welcome-doesnotmatchanything"
}
  }
}'

You can get the following response. In this example, "message" field is
"index": "not_analyzed".
{
   "valid": true,
   "_shards": {
  "total": 1,
  "successful": 1,
  "failed": 0
   },
   "explanations": [
  {
 "index": "YOUR_INDEX,
 "valid": true,
 "explanation": "+id:3955974 +message:welcome-doesnotmatchanything"
  }
   ]
}


See:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-validate.html#search-validate

I hope that those help you out.

Regards,
Jun


2014-11-07 9:47 GMT+09:00 Dave Reed :

> I have a document with a field "message", that contains the following text
> (truncated):
>
> Welcome to test.com!
>
> The assertion field is mapped to have an analyzer that breaks that string
> into the following tokens:
>
> welcome
> to
> test
> com
>
> But, when I search with a query like this:
>
> {
>   "query": {
>
> "query_string": {
>   "query": "id:3955974 AND message:welcome-doesnotmatchanything"
> }
>   }
> }
>
>
>
> To my surprise, it finds the document (3955974 is the document id). The
> dash and everything after it seems to be ignored, because it does not
> matter what I put there, it will still match the document.
>
> I've tried escaping it:
>
> {
>   "query": {
> "query_string": {
>   "query": "id:3955974 AND message:welcome\\-doesnotmatchanything"
> }
>   }
> }
> (note the double escape since it has to be escaped for the JSON too)
>
> But that makes no difference. I still get 1 matching document. If I put it
> in quotes it works:
>
> {
>   "query": {
> "query_string": {
>   "query": "id:3955974 AND message:\"welcome-doesnotmatchanything\""
> }
>   }
> }
>
> It works, meaning it matches 0 documents, since that document does not
> contain the "doesnotmatchanything" token. That's great, but I don't
> understand why the unquoted version does not work. This query is being
> generated so I can't easily just decide to start quoting it, and I can't
> always do that anyway since the user is sometimes going to use wildcards,
> which can't be quoted if I want them to function. I was under the
> assumption that an EscapedUnquotedString is the same as a quoted unespaced
> string (in other words, foo:a\b\c === foo:"abc", assuming all special
> characters are escaped in the unquoted version).
>
> I'm only on ES 1.01, but I don't see anything new or changes that would
> have impacted this behavior in later versions.
>
> Any insights would be helpful! :)
>
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
---
Jun Ohtani
blog : http://blog.johtani.info

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPW8A5zFTiEcT%3D0m%3D-N0ApbfAUBqgMp2hjvmGSJaL1ByLMAAvQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Query_string search containing a dash has unexpected results

2014-11-06 Thread Dave Reed
I have a document with a field "message", that contains the following text 
(truncated):

Welcome to test.com!

The assertion field is mapped to have an analyzer that breaks that string 
into the following tokens:

welcome
to
test
com

But, when I search with a query like this:

{
  "query": {

"query_string": {
  "query": "id:3955974 AND message:welcome-doesnotmatchanything"
}
  }
}



To my surprise, it finds the document (3955974 is the document id). The 
dash and everything after it seems to be ignored, because it does not 
matter what I put there, it will still match the document.

I've tried escaping it:

{
  "query": {
"query_string": {
  "query": "id:3955974 AND message:welcome\\-doesnotmatchanything"
}
  }
}
(note the double escape since it has to be escaped for the JSON too)

But that makes no difference. I still get 1 matching document. If I put it 
in quotes it works:

{
  "query": {
"query_string": {
  "query": "id:3955974 AND message:\"welcome-doesnotmatchanything\""
}
  }
}

It works, meaning it matches 0 documents, since that document does not 
contain the "doesnotmatchanything" token. That's great, but I don't 
understand why the unquoted version does not work. This query is being 
generated so I can't easily just decide to start quoting it, and I can't 
always do that anyway since the user is sometimes going to use wildcards, 
which can't be quoted if I want them to function. I was under the 
assumption that an EscapedUnquotedString is the same as a quoted unespaced 
string (in other words, foo:a\b\c === foo:"abc", assuming all special 
characters are escaped in the unquoted version).

I'm only on ES 1.01, but I don't see anything new or changes that would 
have impacted this behavior in later versions.

Any insights would be helpful! :)




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.