Re: Query_string search containing a dash has unexpected results

2014-11-11 Thread joergpra...@gmail.com
If you want to translate battle-axe into battle axe, note that the
correct method would be to introduce a phrase search with slop 0. The and
operator may also work in most cases but the word positions will be lost,
you get an more unprecise search for docs that contain battle and axe
anywhere in the field.

Jörg

On Tue, Nov 11, 2014 at 1:27 AM, Dave Reed infinit...@gmail.com wrote:

 Yes, and this was the key, thank you so much. But see my reply above about
 the docs on that param being confusing. That was really the source of the
 problem for me.


 On Monday, November 10, 2014 4:15:05 PM UTC-8, Amish Asthana wrote:

 No I am not saying that . I am saying this :
 GET  my_index_v1/mytype/_search
 {
   query: {
 query_string: {
   default_field: name,
   query: welcome-doesnotmatchanything,
   default_operator: AND
 }
   }
 }

 Here I will not get a match as expected. If I do not specify then OR is
 the deafult operator and it will match.
 amish


 On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:

 My default operator doesn't matter if I understand it correctly, because
 I'm specifying the operate explicitly. Also, I can reproduce this behavior
 using a single search term, so there's no operator to speak of. Unless
 you're  saying that the default operator applies to a single term query if
 it is broken into tokens?


 Note that using the welcome-doesnotmatchanything analzyzer will break
 into two tokens with OR and your document will match unless you use AND


 This concerns me... my search looks like:

 message:welcome-doesnotmatchanything

 I cannot break that into an AND. The entire thing is a value provided by
 the end user. You're saying I should on the app side break the string they
 entered into tokens and join them with ANDs? That doesn't seem viable...

 Let me back up and say what I'm expecting the user to be able to do.
 There's a single text box where they can enter a search query, with the
 following rules:
 1. The user may use a trailing wildcard, e.g. foo*
 2. The user may enter multiple terms separated by a space. Only
 documents containing all of the terms will match.
 3. The user might enter special characters, such as in battle-axe,
 simply because that is what they think they should search for, which should
 match documents containing battle and axe (the same as a search for
 battle axe).

 To that end, I am taking their search string and forming a search like
 this:

 message:searchterm AND...

 Where the string is split on spaces and joined with the AND clauses. For
 each individual part of the search phrase, I take care of escaping special
 characters (except * since I am allowing them to use wildcards). For
 example, if they entered foo bar!, I would generate this query:

 message:foo AND message:bar\!

 The problem is they are entering battle-axe, causing me to generate
 this:

 message:battle\-axe

 But that ends up being the same as:

 (message:battle OR message:axe)

 I guess that is what I was not expecting. Because of this behavior, I
 have to know from my app point of view what tokens I should be splitting
 the original string on, so that I can join them back together with ANDs.
 But that means basically reimplementing the tokenizer on my end, does it
 not? There must be a better way? Like specifying I want those terms to be
 joined with ANDs instead?

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEwS3ZGs540HcpBipfa__Q8fjPRVkrrHCt0KXJpKn3a2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed
I'm not using the standard analyzer, I'm using a pattern that will break 
the text on all non-word characters, like this:

analyzer: {
letterordigit: {
type: pattern,
pattern: [^\\p{L}\\p{N}]+
}
}


I have verified that the message field is being broke up into the tokens I 
expect (example in my first post).

So when I run a search for message:welcome-doesnotmatch, I'm expecting that 
string to be broken into tokens like so:

welcome
doesnotmatch

And for the search to therefore find 0 documents. But it doesn't -- it 
finds 1 document, the document that contains my sample message, which does 
not include the token doesnotmatch.

So why on Earth would this search match that document? It is behaving as if 
everything after the - is completely ignored. It does not matter what I 
put there, it will still match the document.

This is coming up because an end user is searching for a hyphenated word, 
like battle-axe, and it's matching a document that does not contain the 
word axe at all.



On Friday, November 7, 2014 12:24:30 AM UTC-8, Jun Ohtani wrote:

 Hi Dave,

 I think the reason is your message field using standard analyzer.
 Standard analyzer divide text by -.
 If you change analyzer to whitespace analyzer, it matches 0 documents.

 _validate API is useful for checking exact query.
 Example request: 

 curl -XGET /YOUR_INDEX/_validate/query?explain -d'
 {
   query: {
 query_string: {
   query: id:3955974 AND message:welcome-doesnotmatchanything
 }
   }
 }'

 You can get the following response. In this example, message field is 
 index: not_analyzed.
 {
valid: true,
_shards: {
   total: 1,
   successful: 1,
   failed: 0
},
explanations: [
   {
  index: YOUR_INDEX,
  valid: true,
  explanation: +id:3955974 +message:welcome-doesnotmatchanything
   }
]
 }


 See: 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-validate.html#search-validate

 I hope that those help you out.

 Regards,
 Jun


 2014-11-07 9:47 GMT+09:00 Dave Reed infin...@gmail.com javascript::

 I have a document with a field message, that contains the following 
 text (truncated):

 Welcome to test.com!

 The assertion field is mapped to have an analyzer that breaks that string 
 into the following tokens:

 welcome
 to
 test
 com

 But, when I search with a query like this:

 {
   query: {

 query_string: {
   query: id:3955974 AND message:welcome-doesnotmatchanything
 }
   }
 }



 To my surprise, it finds the document (3955974 is the document id). The 
 dash and everything after it seems to be ignored, because it does not 
 matter what I put there, it will still match the document.

 I've tried escaping it:

 {
   query: {
 query_string: {
   query: id:3955974 AND message:welcome\\-doesnotmatchanything
 }
   }
 }
 (note the double escape since it has to be escaped for the JSON too)

 But that makes no difference. I still get 1 matching document. If I put 
 it in quotes it works:

 {
   query: {
 query_string: {
   query: id:3955974 AND message:\welcome-doesnotmatchanything\
 }
   }
 }

 It works, meaning it matches 0 documents, since that document does not 
 contain the doesnotmatchanything token. That's great, but I don't 
 understand why the unquoted version does not work. This query is being 
 generated so I can't easily just decide to start quoting it, and I can't 
 always do that anyway since the user is sometimes going to use wildcards, 
 which can't be quoted if I want them to function. I was under the 
 assumption that an EscapedUnquotedString is the same as a quoted unespaced 
 string (in other words, foo:a\b\c === foo:abc, assuming all special 
 characters are escaped in the unquoted version).

 I'm only on ES 1.01, but I don't see anything new or changes that would 
 have impacted this behavior in later versions.

 Any insights would be helpful! :)




  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 -- 
 ---
 Jun Ohtani
 blog : http://blog.johtani.info
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Amish Asthana
Can you run the validate query output. That will be helpful.
amish

On Thursday, November 6, 2014 4:47:12 PM UTC-8, Dave Reed wrote:

 I have a document with a field message, that contains the following text 
 (truncated):

 Welcome to test.com!

 The assertion field is mapped to have an analyzer that breaks that string 
 into the following tokens:

 welcome
 to
 test
 com

 But, when I search with a query like this:

 {
   query: {

 query_string: {
   query: id:3955974 AND message:welcome-doesnotmatchanything
 }
   }
 }



 To my surprise, it finds the document (3955974 is the document id). The 
 dash and everything after it seems to be ignored, because it does not 
 matter what I put there, it will still match the document.

 I've tried escaping it:

 {
   query: {
 query_string: {
   query: id:3955974 AND message:welcome\\-doesnotmatchanything
 }
   }
 }
 (note the double escape since it has to be escaped for the JSON too)

 But that makes no difference. I still get 1 matching document. If I put it 
 in quotes it works:

 {
   query: {
 query_string: {
   query: id:3955974 AND message:\welcome-doesnotmatchanything\
 }
   }
 }

 It works, meaning it matches 0 documents, since that document does not 
 contain the doesnotmatchanything token. That's great, but I don't 
 understand why the unquoted version does not work. This query is being 
 generated so I can't easily just decide to start quoting it, and I can't 
 always do that anyway since the user is sometimes going to use wildcards, 
 which can't be quoted if I want them to function. I was under the 
 assumption that an EscapedUnquotedString is the same as a quoted unespaced 
 string (in other words, foo:a\b\c === foo:abc, assuming all special 
 characters are escaped in the unquoted version).

 I'm only on ES 1.01, but I don't see anything new or changes that would 
 have impacted this behavior in later versions.

 Any insights would be helpful! :)






-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7790c6fc-5578-4434-9bd2-fd846e59a997%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed
Yes of course :) Here we go:

{
   
   - valid: true
   - _shards: {
  - total: 1
  - successful: 1
  - failed: 0
   }
   - explanations: [
  - {
 - index: index_v1
 - valid: true
 - explanation: message:welcome message:doesnotmatch
  }
   ]

}

It pasted a little weird but that's it.



On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:

 Can you run the validate query output. That will be helpful.
 amish




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/83422fed-2e1c-4e27-825e-5bd9f334f85a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed
Also interesting... if I run the query with explain=true, I see information 
in the details about the welcome token, but there's no mention at all 
about the doesnotmatch token. I guess it wouldn't mention it though, 
since if it did, the document shouldn't match in the first place.

On Monday, November 10, 2014 2:45:05 PM UTC-8, Dave Reed wrote:

 Yes of course :) Here we go:

 {

- valid: true
- _shards: {
   - total: 1
   - successful: 1
   - failed: 0
}
- explanations: [
   - {
  - index: index_v1
  - valid: true
  - explanation: message:welcome message:doesnotmatch
   }
]

 }

 It pasted a little weird but that's it.



 On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:

 Can you run the validate query output. That will be helpful.
 amish




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/632d1e74-31a0-42f2-ad09-40e3030449d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Amish Asthana
I created a test index using your pattern and I am seeing the appropriate 
behaviour.
I am assuming you are using the same analyzer for search/query as well as 
ensuring that your DEFAULT OPERATOR is AND.
Note that using the welcome-doesnotmatchanything analzyzer will break into 
two tokens with OR and your document will match unless you use AND.
amish

On Monday, November 10, 2014 2:48:06 PM UTC-8, Dave Reed wrote:

 Also interesting... if I run the query with explain=true, I see 
 information in the details about the welcome token, but there's no 
 mention at all about the doesnotmatch token. I guess it wouldn't mention 
 it though, since if it did, the document shouldn't match in the first place.

 On Monday, November 10, 2014 2:45:05 PM UTC-8, Dave Reed wrote:

 Yes of course :) Here we go:

 {

- valid: true
- _shards: {
   - total: 1
   - successful: 1
   - failed: 0
}
- explanations: [
   - {
  - index: index_v1
  - valid: true
  - explanation: message:welcome message:doesnotmatch
   }
]

 }

 It pasted a little weird but that's it.



 On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:

 Can you run the validate query output. That will be helpful.
 amish




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6f17d388-83c9-4d75-8f6f-8af3b4dc954b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed
My default operator doesn't matter if I understand it correctly, because 
I'm specifying the operate explicitly. Also, I can reproduce this behavior 
using a single search term, so there's no operator to speak of. Unless 
you're  saying that the default operator applies to a single term query if 
it is broken into tokens?
 

 Note that using the welcome-doesnotmatchanything analzyzer will break 
 into two tokens with OR and your document will match unless you use AND


This concerns me... my search looks like:

message:welcome-doesnotmatchanything

I cannot break that into an AND. The entire thing is a value provided by 
the end user. You're saying I should on the app side break the string they 
entered into tokens and join them with ANDs? That doesn't seem viable...

Let me back up and say what I'm expecting the user to be able to do. 
There's a single text box where they can enter a search query, with the 
following rules:
1. The user may use a trailing wildcard, e.g. foo*
2. The user may enter multiple terms separated by a space. Only documents 
containing all of the terms will match.
3. The user might enter special characters, such as in battle-axe, simply 
because that is what they think they should search for, which should match 
documents containing battle and axe (the same as a search for battle 
axe).

To that end, I am taking their search string and forming a search like this:

message:searchterm AND...

Where the string is split on spaces and joined with the AND clauses. For 
each individual part of the search phrase, I take care of escaping special 
characters (except * since I am allowing them to use wildcards). For 
example, if they entered foo bar!, I would generate this query:

message:foo AND message:bar\!

The problem is they are entering battle-axe, causing me to generate this:

message:battle\-axe

But that ends up being the same as:

(message:battle OR message:axe)

I guess that is what I was not expecting. Because of this behavior, I have 
to know from my app point of view what tokens I should be splitting the 
original string on, so that I can join them back together with ANDs. But 
that means basically reimplementing the tokenizer on my end, does it not? 
There must be a better way? Like specifying I want those terms to be joined 
with ANDs instead?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/924a04d5-4163-41b5-a7e7-e3ca2982d078%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed
Ok... specifying default_operator: AND worked

In that case, I'd like to say that the docs on that option are incomplete 
or confusing. It says:

The default operator used if no explicit operator is specified. For example, 
with a default operator of OR, the query capital of Hungary is translated 
to capital OR of OR Hungary, and with default operator of AND, the same 
query is translated to capital AND of AND Hungary. The default value is OR.

That's all well and good, but my query does not have multiple terms like 
that. I have a single term for a single field. The default operator is 
applying to the resulting tokens of that, after they are generated by the 
analyzer. I assumed that the default operator applied at the level of the 
query being parsed and that had nothing at all to do with the analyzer. 
Making that clearer could have saved me a lot of time :)

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1a058ca-b179-495a-8b82-e65fece4f99f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Amish Asthana
No I am not saying that . I am saying this :
GET  my_index_v1/mytype/_search
{
  query: {
query_string: {
  default_field: name,
  query: welcome-doesnotmatchanything,
  default_operator: AND
}
  }
}

Here I will not get a match as expected. If I do not specify then OR is the 
deafult operator and it will match.
amish


On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:

 My default operator doesn't matter if I understand it correctly, because 
 I'm specifying the operate explicitly. Also, I can reproduce this behavior 
 using a single search term, so there's no operator to speak of. Unless 
 you're  saying that the default operator applies to a single term query if 
 it is broken into tokens?
  

 Note that using the welcome-doesnotmatchanything analzyzer will break 
 into two tokens with OR and your document will match unless you use AND


 This concerns me... my search looks like:

 message:welcome-doesnotmatchanything

 I cannot break that into an AND. The entire thing is a value provided by 
 the end user. You're saying I should on the app side break the string they 
 entered into tokens and join them with ANDs? That doesn't seem viable...

 Let me back up and say what I'm expecting the user to be able to do. 
 There's a single text box where they can enter a search query, with the 
 following rules:
 1. The user may use a trailing wildcard, e.g. foo*
 2. The user may enter multiple terms separated by a space. Only documents 
 containing all of the terms will match.
 3. The user might enter special characters, such as in battle-axe, 
 simply because that is what they think they should search for, which should 
 match documents containing battle and axe (the same as a search for 
 battle axe).

 To that end, I am taking their search string and forming a search like 
 this:

 message:searchterm AND...

 Where the string is split on spaces and joined with the AND clauses. For 
 each individual part of the search phrase, I take care of escaping special 
 characters (except * since I am allowing them to use wildcards). For 
 example, if they entered foo bar!, I would generate this query:

 message:foo AND message:bar\!

 The problem is they are entering battle-axe, causing me to generate this:

 message:battle\-axe

 But that ends up being the same as:

 (message:battle OR message:axe)

 I guess that is what I was not expecting. Because of this behavior, I have 
 to know from my app point of view what tokens I should be splitting the 
 original string on, so that I can join them back together with ANDs. But 
 that means basically reimplementing the tokenizer on my end, does it not? 
 There must be a better way? Like specifying I want those terms to be joined 
 with ANDs instead?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b20d4b80-2ebd-4b5c-a1e5-a434c2d68598%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed
Yes, and this was the key, thank you so much. But see my reply above about 
the docs on that param being confusing. That was really the source of the 
problem for me.

On Monday, November 10, 2014 4:15:05 PM UTC-8, Amish Asthana wrote:

 No I am not saying that . I am saying this :
 GET  my_index_v1/mytype/_search
 {
   query: {
 query_string: {
   default_field: name,
   query: welcome-doesnotmatchanything,
   default_operator: AND
 }
   }
 }

 Here I will not get a match as expected. If I do not specify then OR is 
 the deafult operator and it will match.
 amish


 On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:

 My default operator doesn't matter if I understand it correctly, because 
 I'm specifying the operate explicitly. Also, I can reproduce this behavior 
 using a single search term, so there's no operator to speak of. Unless 
 you're  saying that the default operator applies to a single term query if 
 it is broken into tokens?
  

 Note that using the welcome-doesnotmatchanything analzyzer will break 
 into two tokens with OR and your document will match unless you use AND


 This concerns me... my search looks like:

 message:welcome-doesnotmatchanything

 I cannot break that into an AND. The entire thing is a value provided by 
 the end user. You're saying I should on the app side break the string they 
 entered into tokens and join them with ANDs? That doesn't seem viable...

 Let me back up and say what I'm expecting the user to be able to do. 
 There's a single text box where they can enter a search query, with the 
 following rules:
 1. The user may use a trailing wildcard, e.g. foo*
 2. The user may enter multiple terms separated by a space. Only documents 
 containing all of the terms will match.
 3. The user might enter special characters, such as in battle-axe, 
 simply because that is what they think they should search for, which should 
 match documents containing battle and axe (the same as a search for 
 battle axe).

 To that end, I am taking their search string and forming a search like 
 this:

 message:searchterm AND...

 Where the string is split on spaces and joined with the AND clauses. For 
 each individual part of the search phrase, I take care of escaping special 
 characters (except * since I am allowing them to use wildcards). For 
 example, if they entered foo bar!, I would generate this query:

 message:foo AND message:bar\!

 The problem is they are entering battle-axe, causing me to generate 
 this:

 message:battle\-axe

 But that ends up being the same as:

 (message:battle OR message:axe)

 I guess that is what I was not expecting. Because of this behavior, I 
 have to know from my app point of view what tokens I should be splitting 
 the original string on, so that I can join them back together with ANDs. 
 But that means basically reimplementing the tokenizer on my end, does it 
 not? There must be a better way? Like specifying I want those terms to be 
 joined with ANDs instead?



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-07 Thread Jun Ohtani
Hi Dave,

I think the reason is your message field using standard analyzer.
Standard analyzer divide text by -.
If you change analyzer to whitespace analyzer, it matches 0 documents.

_validate API is useful for checking exact query.
Example request:

curl -XGET /YOUR_INDEX/_validate/query?explain -d'
{
  query: {
query_string: {
  query: id:3955974 AND message:welcome-doesnotmatchanything
}
  }
}'

You can get the following response. In this example, message field is
index: not_analyzed.
{
   valid: true,
   _shards: {
  total: 1,
  successful: 1,
  failed: 0
   },
   explanations: [
  {
 index: YOUR_INDEX,
 valid: true,
 explanation: +id:3955974 +message:welcome-doesnotmatchanything
  }
   ]
}


See:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-validate.html#search-validate

I hope that those help you out.

Regards,
Jun


2014-11-07 9:47 GMT+09:00 Dave Reed infinit...@gmail.com:

 I have a document with a field message, that contains the following text
 (truncated):

 Welcome to test.com!

 The assertion field is mapped to have an analyzer that breaks that string
 into the following tokens:

 welcome
 to
 test
 com

 But, when I search with a query like this:

 {
   query: {

 query_string: {
   query: id:3955974 AND message:welcome-doesnotmatchanything
 }
   }
 }



 To my surprise, it finds the document (3955974 is the document id). The
 dash and everything after it seems to be ignored, because it does not
 matter what I put there, it will still match the document.

 I've tried escaping it:

 {
   query: {
 query_string: {
   query: id:3955974 AND message:welcome\\-doesnotmatchanything
 }
   }
 }
 (note the double escape since it has to be escaped for the JSON too)

 But that makes no difference. I still get 1 matching document. If I put it
 in quotes it works:

 {
   query: {
 query_string: {
   query: id:3955974 AND message:\welcome-doesnotmatchanything\
 }
   }
 }

 It works, meaning it matches 0 documents, since that document does not
 contain the doesnotmatchanything token. That's great, but I don't
 understand why the unquoted version does not work. This query is being
 generated so I can't easily just decide to start quoting it, and I can't
 always do that anyway since the user is sometimes going to use wildcards,
 which can't be quoted if I want them to function. I was under the
 assumption that an EscapedUnquotedString is the same as a quoted unespaced
 string (in other words, foo:a\b\c === foo:abc, assuming all special
 characters are escaped in the unquoted version).

 I'm only on ES 1.01, but I don't see anything new or changes that would
 have impacted this behavior in later versions.

 Any insights would be helpful! :)




  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
---
Jun Ohtani
blog : http://blog.johtani.info

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPW8A5zFTiEcT%3D0m%3D-N0ApbfAUBqgMp2hjvmGSJaL1ByLMAAvQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.