Re: Query_string search containing a dash has unexpected results
If you want to translate battle-axe into battle axe, note that the correct method would be to introduce a phrase search with slop 0. The and operator may also work in most cases but the word positions will be lost, you get an more unprecise search for docs that contain battle and axe anywhere in the field. Jörg On Tue, Nov 11, 2014 at 1:27 AM, Dave Reed infinit...@gmail.com wrote: Yes, and this was the key, thank you so much. But see my reply above about the docs on that param being confusing. That was really the source of the problem for me. On Monday, November 10, 2014 4:15:05 PM UTC-8, Amish Asthana wrote: No I am not saying that . I am saying this : GET my_index_v1/mytype/_search { query: { query_string: { default_field: name, query: welcome-doesnotmatchanything, default_operator: AND } } } Here I will not get a match as expected. If I do not specify then OR is the deafult operator and it will match. amish On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote: My default operator doesn't matter if I understand it correctly, because I'm specifying the operate explicitly. Also, I can reproduce this behavior using a single search term, so there's no operator to speak of. Unless you're saying that the default operator applies to a single term query if it is broken into tokens? Note that using the welcome-doesnotmatchanything analzyzer will break into two tokens with OR and your document will match unless you use AND This concerns me... my search looks like: message:welcome-doesnotmatchanything I cannot break that into an AND. The entire thing is a value provided by the end user. You're saying I should on the app side break the string they entered into tokens and join them with ANDs? That doesn't seem viable... Let me back up and say what I'm expecting the user to be able to do. There's a single text box where they can enter a search query, with the following rules: 1. The user may use a trailing wildcard, e.g. foo* 2. The user may enter multiple terms separated by a space. Only documents containing all of the terms will match. 3. The user might enter special characters, such as in battle-axe, simply because that is what they think they should search for, which should match documents containing battle and axe (the same as a search for battle axe). To that end, I am taking their search string and forming a search like this: message:searchterm AND... Where the string is split on spaces and joined with the AND clauses. For each individual part of the search phrase, I take care of escaping special characters (except * since I am allowing them to use wildcards). For example, if they entered foo bar!, I would generate this query: message:foo AND message:bar\! The problem is they are entering battle-axe, causing me to generate this: message:battle\-axe But that ends up being the same as: (message:battle OR message:axe) I guess that is what I was not expecting. Because of this behavior, I have to know from my app point of view what tokens I should be splitting the original string on, so that I can join them back together with ANDs. But that means basically reimplementing the tokenizer on my end, does it not? There must be a better way? Like specifying I want those terms to be joined with ANDs instead? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEwS3ZGs540HcpBipfa__Q8fjPRVkrrHCt0KXJpKn3a2Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Query_string search containing a dash has unexpected results
I'm not using the standard analyzer, I'm using a pattern that will break the text on all non-word characters, like this: analyzer: { letterordigit: { type: pattern, pattern: [^\\p{L}\\p{N}]+ } } I have verified that the message field is being broke up into the tokens I expect (example in my first post). So when I run a search for message:welcome-doesnotmatch, I'm expecting that string to be broken into tokens like so: welcome doesnotmatch And for the search to therefore find 0 documents. But it doesn't -- it finds 1 document, the document that contains my sample message, which does not include the token doesnotmatch. So why on Earth would this search match that document? It is behaving as if everything after the - is completely ignored. It does not matter what I put there, it will still match the document. This is coming up because an end user is searching for a hyphenated word, like battle-axe, and it's matching a document that does not contain the word axe at all. On Friday, November 7, 2014 12:24:30 AM UTC-8, Jun Ohtani wrote: Hi Dave, I think the reason is your message field using standard analyzer. Standard analyzer divide text by -. If you change analyzer to whitespace analyzer, it matches 0 documents. _validate API is useful for checking exact query. Example request: curl -XGET /YOUR_INDEX/_validate/query?explain -d' { query: { query_string: { query: id:3955974 AND message:welcome-doesnotmatchanything } } }' You can get the following response. In this example, message field is index: not_analyzed. { valid: true, _shards: { total: 1, successful: 1, failed: 0 }, explanations: [ { index: YOUR_INDEX, valid: true, explanation: +id:3955974 +message:welcome-doesnotmatchanything } ] } See: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-validate.html#search-validate I hope that those help you out. Regards, Jun 2014-11-07 9:47 GMT+09:00 Dave Reed infin...@gmail.com javascript:: I have a document with a field message, that contains the following text (truncated): Welcome to test.com! The assertion field is mapped to have an analyzer that breaks that string into the following tokens: welcome to test com But, when I search with a query like this: { query: { query_string: { query: id:3955974 AND message:welcome-doesnotmatchanything } } } To my surprise, it finds the document (3955974 is the document id). The dash and everything after it seems to be ignored, because it does not matter what I put there, it will still match the document. I've tried escaping it: { query: { query_string: { query: id:3955974 AND message:welcome\\-doesnotmatchanything } } } (note the double escape since it has to be escaped for the JSON too) But that makes no difference. I still get 1 matching document. If I put it in quotes it works: { query: { query_string: { query: id:3955974 AND message:\welcome-doesnotmatchanything\ } } } It works, meaning it matches 0 documents, since that document does not contain the doesnotmatchanything token. That's great, but I don't understand why the unquoted version does not work. This query is being generated so I can't easily just decide to start quoting it, and I can't always do that anyway since the user is sometimes going to use wildcards, which can't be quoted if I want them to function. I was under the assumption that an EscapedUnquotedString is the same as a quoted unespaced string (in other words, foo:a\b\c === foo:abc, assuming all special characters are escaped in the unquoted version). I'm only on ES 1.01, but I don't see anything new or changes that would have impacted this behavior in later versions. Any insights would be helpful! :) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- --- Jun Ohtani blog : http://blog.johtani.info -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit
Re: Query_string search containing a dash has unexpected results
Can you run the validate query output. That will be helpful. amish On Thursday, November 6, 2014 4:47:12 PM UTC-8, Dave Reed wrote: I have a document with a field message, that contains the following text (truncated): Welcome to test.com! The assertion field is mapped to have an analyzer that breaks that string into the following tokens: welcome to test com But, when I search with a query like this: { query: { query_string: { query: id:3955974 AND message:welcome-doesnotmatchanything } } } To my surprise, it finds the document (3955974 is the document id). The dash and everything after it seems to be ignored, because it does not matter what I put there, it will still match the document. I've tried escaping it: { query: { query_string: { query: id:3955974 AND message:welcome\\-doesnotmatchanything } } } (note the double escape since it has to be escaped for the JSON too) But that makes no difference. I still get 1 matching document. If I put it in quotes it works: { query: { query_string: { query: id:3955974 AND message:\welcome-doesnotmatchanything\ } } } It works, meaning it matches 0 documents, since that document does not contain the doesnotmatchanything token. That's great, but I don't understand why the unquoted version does not work. This query is being generated so I can't easily just decide to start quoting it, and I can't always do that anyway since the user is sometimes going to use wildcards, which can't be quoted if I want them to function. I was under the assumption that an EscapedUnquotedString is the same as a quoted unespaced string (in other words, foo:a\b\c === foo:abc, assuming all special characters are escaped in the unquoted version). I'm only on ES 1.01, but I don't see anything new or changes that would have impacted this behavior in later versions. Any insights would be helpful! :) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7790c6fc-5578-4434-9bd2-fd846e59a997%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query_string search containing a dash has unexpected results
Yes of course :) Here we go: { - valid: true - _shards: { - total: 1 - successful: 1 - failed: 0 } - explanations: [ - { - index: index_v1 - valid: true - explanation: message:welcome message:doesnotmatch } ] } It pasted a little weird but that's it. On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote: Can you run the validate query output. That will be helpful. amish -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/83422fed-2e1c-4e27-825e-5bd9f334f85a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query_string search containing a dash has unexpected results
Also interesting... if I run the query with explain=true, I see information in the details about the welcome token, but there's no mention at all about the doesnotmatch token. I guess it wouldn't mention it though, since if it did, the document shouldn't match in the first place. On Monday, November 10, 2014 2:45:05 PM UTC-8, Dave Reed wrote: Yes of course :) Here we go: { - valid: true - _shards: { - total: 1 - successful: 1 - failed: 0 } - explanations: [ - { - index: index_v1 - valid: true - explanation: message:welcome message:doesnotmatch } ] } It pasted a little weird but that's it. On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote: Can you run the validate query output. That will be helpful. amish -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/632d1e74-31a0-42f2-ad09-40e3030449d9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query_string search containing a dash has unexpected results
I created a test index using your pattern and I am seeing the appropriate behaviour. I am assuming you are using the same analyzer for search/query as well as ensuring that your DEFAULT OPERATOR is AND. Note that using the welcome-doesnotmatchanything analzyzer will break into two tokens with OR and your document will match unless you use AND. amish On Monday, November 10, 2014 2:48:06 PM UTC-8, Dave Reed wrote: Also interesting... if I run the query with explain=true, I see information in the details about the welcome token, but there's no mention at all about the doesnotmatch token. I guess it wouldn't mention it though, since if it did, the document shouldn't match in the first place. On Monday, November 10, 2014 2:45:05 PM UTC-8, Dave Reed wrote: Yes of course :) Here we go: { - valid: true - _shards: { - total: 1 - successful: 1 - failed: 0 } - explanations: [ - { - index: index_v1 - valid: true - explanation: message:welcome message:doesnotmatch } ] } It pasted a little weird but that's it. On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote: Can you run the validate query output. That will be helpful. amish -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f17d388-83c9-4d75-8f6f-8af3b4dc954b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query_string search containing a dash has unexpected results
My default operator doesn't matter if I understand it correctly, because I'm specifying the operate explicitly. Also, I can reproduce this behavior using a single search term, so there's no operator to speak of. Unless you're saying that the default operator applies to a single term query if it is broken into tokens? Note that using the welcome-doesnotmatchanything analzyzer will break into two tokens with OR and your document will match unless you use AND This concerns me... my search looks like: message:welcome-doesnotmatchanything I cannot break that into an AND. The entire thing is a value provided by the end user. You're saying I should on the app side break the string they entered into tokens and join them with ANDs? That doesn't seem viable... Let me back up and say what I'm expecting the user to be able to do. There's a single text box where they can enter a search query, with the following rules: 1. The user may use a trailing wildcard, e.g. foo* 2. The user may enter multiple terms separated by a space. Only documents containing all of the terms will match. 3. The user might enter special characters, such as in battle-axe, simply because that is what they think they should search for, which should match documents containing battle and axe (the same as a search for battle axe). To that end, I am taking their search string and forming a search like this: message:searchterm AND... Where the string is split on spaces and joined with the AND clauses. For each individual part of the search phrase, I take care of escaping special characters (except * since I am allowing them to use wildcards). For example, if they entered foo bar!, I would generate this query: message:foo AND message:bar\! The problem is they are entering battle-axe, causing me to generate this: message:battle\-axe But that ends up being the same as: (message:battle OR message:axe) I guess that is what I was not expecting. Because of this behavior, I have to know from my app point of view what tokens I should be splitting the original string on, so that I can join them back together with ANDs. But that means basically reimplementing the tokenizer on my end, does it not? There must be a better way? Like specifying I want those terms to be joined with ANDs instead? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/924a04d5-4163-41b5-a7e7-e3ca2982d078%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query_string search containing a dash has unexpected results
Ok... specifying default_operator: AND worked In that case, I'd like to say that the docs on that option are incomplete or confusing. It says: The default operator used if no explicit operator is specified. For example, with a default operator of OR, the query capital of Hungary is translated to capital OR of OR Hungary, and with default operator of AND, the same query is translated to capital AND of AND Hungary. The default value is OR. That's all well and good, but my query does not have multiple terms like that. I have a single term for a single field. The default operator is applying to the resulting tokens of that, after they are generated by the analyzer. I assumed that the default operator applied at the level of the query being parsed and that had nothing at all to do with the analyzer. Making that clearer could have saved me a lot of time :) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1a058ca-b179-495a-8b82-e65fece4f99f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query_string search containing a dash has unexpected results
No I am not saying that . I am saying this : GET my_index_v1/mytype/_search { query: { query_string: { default_field: name, query: welcome-doesnotmatchanything, default_operator: AND } } } Here I will not get a match as expected. If I do not specify then OR is the deafult operator and it will match. amish On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote: My default operator doesn't matter if I understand it correctly, because I'm specifying the operate explicitly. Also, I can reproduce this behavior using a single search term, so there's no operator to speak of. Unless you're saying that the default operator applies to a single term query if it is broken into tokens? Note that using the welcome-doesnotmatchanything analzyzer will break into two tokens with OR and your document will match unless you use AND This concerns me... my search looks like: message:welcome-doesnotmatchanything I cannot break that into an AND. The entire thing is a value provided by the end user. You're saying I should on the app side break the string they entered into tokens and join them with ANDs? That doesn't seem viable... Let me back up and say what I'm expecting the user to be able to do. There's a single text box where they can enter a search query, with the following rules: 1. The user may use a trailing wildcard, e.g. foo* 2. The user may enter multiple terms separated by a space. Only documents containing all of the terms will match. 3. The user might enter special characters, such as in battle-axe, simply because that is what they think they should search for, which should match documents containing battle and axe (the same as a search for battle axe). To that end, I am taking their search string and forming a search like this: message:searchterm AND... Where the string is split on spaces and joined with the AND clauses. For each individual part of the search phrase, I take care of escaping special characters (except * since I am allowing them to use wildcards). For example, if they entered foo bar!, I would generate this query: message:foo AND message:bar\! The problem is they are entering battle-axe, causing me to generate this: message:battle\-axe But that ends up being the same as: (message:battle OR message:axe) I guess that is what I was not expecting. Because of this behavior, I have to know from my app point of view what tokens I should be splitting the original string on, so that I can join them back together with ANDs. But that means basically reimplementing the tokenizer on my end, does it not? There must be a better way? Like specifying I want those terms to be joined with ANDs instead? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b20d4b80-2ebd-4b5c-a1e5-a434c2d68598%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query_string search containing a dash has unexpected results
Yes, and this was the key, thank you so much. But see my reply above about the docs on that param being confusing. That was really the source of the problem for me. On Monday, November 10, 2014 4:15:05 PM UTC-8, Amish Asthana wrote: No I am not saying that . I am saying this : GET my_index_v1/mytype/_search { query: { query_string: { default_field: name, query: welcome-doesnotmatchanything, default_operator: AND } } } Here I will not get a match as expected. If I do not specify then OR is the deafult operator and it will match. amish On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote: My default operator doesn't matter if I understand it correctly, because I'm specifying the operate explicitly. Also, I can reproduce this behavior using a single search term, so there's no operator to speak of. Unless you're saying that the default operator applies to a single term query if it is broken into tokens? Note that using the welcome-doesnotmatchanything analzyzer will break into two tokens with OR and your document will match unless you use AND This concerns me... my search looks like: message:welcome-doesnotmatchanything I cannot break that into an AND. The entire thing is a value provided by the end user. You're saying I should on the app side break the string they entered into tokens and join them with ANDs? That doesn't seem viable... Let me back up and say what I'm expecting the user to be able to do. There's a single text box where they can enter a search query, with the following rules: 1. The user may use a trailing wildcard, e.g. foo* 2. The user may enter multiple terms separated by a space. Only documents containing all of the terms will match. 3. The user might enter special characters, such as in battle-axe, simply because that is what they think they should search for, which should match documents containing battle and axe (the same as a search for battle axe). To that end, I am taking their search string and forming a search like this: message:searchterm AND... Where the string is split on spaces and joined with the AND clauses. For each individual part of the search phrase, I take care of escaping special characters (except * since I am allowing them to use wildcards). For example, if they entered foo bar!, I would generate this query: message:foo AND message:bar\! The problem is they are entering battle-axe, causing me to generate this: message:battle\-axe But that ends up being the same as: (message:battle OR message:axe) I guess that is what I was not expecting. Because of this behavior, I have to know from my app point of view what tokens I should be splitting the original string on, so that I can join them back together with ANDs. But that means basically reimplementing the tokenizer on my end, does it not? There must be a better way? Like specifying I want those terms to be joined with ANDs instead? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query_string search containing a dash has unexpected results
Hi Dave, I think the reason is your message field using standard analyzer. Standard analyzer divide text by -. If you change analyzer to whitespace analyzer, it matches 0 documents. _validate API is useful for checking exact query. Example request: curl -XGET /YOUR_INDEX/_validate/query?explain -d' { query: { query_string: { query: id:3955974 AND message:welcome-doesnotmatchanything } } }' You can get the following response. In this example, message field is index: not_analyzed. { valid: true, _shards: { total: 1, successful: 1, failed: 0 }, explanations: [ { index: YOUR_INDEX, valid: true, explanation: +id:3955974 +message:welcome-doesnotmatchanything } ] } See: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-validate.html#search-validate I hope that those help you out. Regards, Jun 2014-11-07 9:47 GMT+09:00 Dave Reed infinit...@gmail.com: I have a document with a field message, that contains the following text (truncated): Welcome to test.com! The assertion field is mapped to have an analyzer that breaks that string into the following tokens: welcome to test com But, when I search with a query like this: { query: { query_string: { query: id:3955974 AND message:welcome-doesnotmatchanything } } } To my surprise, it finds the document (3955974 is the document id). The dash and everything after it seems to be ignored, because it does not matter what I put there, it will still match the document. I've tried escaping it: { query: { query_string: { query: id:3955974 AND message:welcome\\-doesnotmatchanything } } } (note the double escape since it has to be escaped for the JSON too) But that makes no difference. I still get 1 matching document. If I put it in quotes it works: { query: { query_string: { query: id:3955974 AND message:\welcome-doesnotmatchanything\ } } } It works, meaning it matches 0 documents, since that document does not contain the doesnotmatchanything token. That's great, but I don't understand why the unquoted version does not work. This query is being generated so I can't easily just decide to start quoting it, and I can't always do that anyway since the user is sometimes going to use wildcards, which can't be quoted if I want them to function. I was under the assumption that an EscapedUnquotedString is the same as a quoted unespaced string (in other words, foo:a\b\c === foo:abc, assuming all special characters are escaped in the unquoted version). I'm only on ES 1.01, but I don't see anything new or changes that would have impacted this behavior in later versions. Any insights would be helpful! :) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- --- Jun Ohtani blog : http://blog.johtani.info -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPW8A5zFTiEcT%3D0m%3D-N0ApbfAUBqgMp2hjvmGSJaL1ByLMAAvQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.