subject:"Re\: Query_string search containing a dash has unexpected results"

Re: Query_string search containing a dash has unexpected results

2014-11-11 Thread joergpra...@gmail.com

If you want to translate battle-axe into battle axe, note that the
correct method would be to introduce a phrase search with slop 0. The and
operator may also work in most cases but the word positions will be lost,
you get an more unprecise search for docs that contain battle and axe
anywhere in the field.

Jörg

On Tue, Nov 11, 2014 at 1:27 AM, Dave Reed infinit...@gmail.com wrote:

Yes, and this was the key, thank you so much. But see my reply above about
the docs on that param being confusing. That was really the source of the
problem for me.

On Monday, November 10, 2014 4:15:05 PM UTC-8, Amish Asthana wrote:

No I am not saying that . I am saying this :
GET my_index_v1/mytype/_search
{
query: {
query_string: {
default_field: name,
query: welcome-doesnotmatchanything,
default_operator: AND
}
}
}

Here I will not get a match as expected. If I do not specify then OR is
the deafult operator and it will match.
amish

On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:

My default operator doesn't matter if I understand it correctly, because
I'm specifying the operate explicitly. Also, I can reproduce this behavior
using a single search term, so there's no operator to speak of. Unless
you're saying that the default operator applies to a single term query if
it is broken into tokens?

Note that using the welcome-doesnotmatchanything analzyzer will break
into two tokens with OR and your document will match unless you use AND

This concerns me... my search looks like:

message:welcome-doesnotmatchanything

I cannot break that into an AND. The entire thing is a value provided by
the end user. You're saying I should on the app side break the string they
entered into tokens and join them with ANDs? That doesn't seem viable...

Let me back up and say what I'm expecting the user to be able to do.
There's a single text box where they can enter a search query, with the
following rules:
1. The user may use a trailing wildcard, e.g. foo*
2. The user may enter multiple terms separated by a space. Only
documents containing all of the terms will match.
3. The user might enter special characters, such as in battle-axe,
simply because that is what they think they should search for, which should
match documents containing battle and axe (the same as a search for
battle axe).

To that end, I am taking their search string and forming a search like
this:

message:searchterm AND...

Where the string is split on spaces and joined with the AND clauses. For
each individual part of the search phrase, I take care of escaping special
characters (except * since I am allowing them to use wildcards). For
example, if they entered foo bar!, I would generate this query:

message:foo AND message:bar\!

The problem is they are entering battle-axe, causing me to generate
this:

message:battle\-axe

But that ends up being the same as:

(message:battle OR message:axe)

I guess that is what I was not expecting. Because of this behavior, I
have to know from my app point of view what tokens I should be splitting
the original string on, so that I can join them back together with ANDs.
But that means basically reimplementing the tokenizer on my end, does it
not? There must be a better way? Like specifying I want those terms to be
joined with ANDs instead?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEwS3ZGs540HcpBipfa__Q8fjPRVkrrHCt0KXJpKn3a2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed

I'm not using the standard analyzer, I'm using a pattern that will break
the text on all non-word characters, like this:

analyzer: {
letterordigit: {
type: pattern,
pattern: [^\\p{L}\\p{N}]+
}
}

I have verified that the message field is being broke up into the tokens I
expect (example in my first post).

So when I run a search for message:welcome-doesnotmatch, I'm expecting that
string to be broken into tokens like so:

welcome
doesnotmatch

And for the search to therefore find 0 documents. But it doesn't -- it
finds 1 document, the document that contains my sample message, which does
not include the token doesnotmatch.

So why on Earth would this search match that document? It is behaving as if
everything after the - is completely ignored. It does not matter what I
put there, it will still match the document.

This is coming up because an end user is searching for a hyphenated word,
like battle-axe, and it's matching a document that does not contain the
word axe at all.

On Friday, November 7, 2014 12:24:30 AM UTC-8, Jun Ohtani wrote:

Hi Dave,

I think the reason is your message field using standard analyzer.
Standard analyzer divide text by -.
If you change analyzer to whitespace analyzer, it matches 0 documents.

_validate API is useful for checking exact query.
Example request:

curl -XGET /YOUR_INDEX/_validate/query?explain -d'
{
query: {
query_string: {
query: id:3955974 AND message:welcome-doesnotmatchanything
}
}
}'

You can get the following response. In this example, message field is
index: not_analyzed.
{
valid: true,
_shards: {
total: 1,
successful: 1,
failed: 0
},
explanations: [
{
index: YOUR_INDEX,
valid: true,
explanation: +id:3955974 +message:welcome-doesnotmatchanything
}
]
}

See:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-validate.html#search-validate

I hope that those help you out.

Regards,
Jun

2014-11-07 9:47 GMT+09:00 Dave Reed infin...@gmail.com javascript::

I have a document with a field message, that contains the following
text (truncated):

Welcome to test.com!

The assertion field is mapped to have an analyzer that breaks that string
into the following tokens:

welcome
to
test
com

But, when I search with a query like this:

{
query: {

query_string: {
query: id:3955974 AND message:welcome-doesnotmatchanything
}
}
}

To my surprise, it finds the document (3955974 is the document id). The
dash and everything after it seems to be ignored, because it does not
matter what I put there, it will still match the document.

I've tried escaping it:

{
query: {
query_string: {
query: id:3955974 AND message:welcome\\-doesnotmatchanything
}
}
}
(note the double escape since it has to be escaped for the JSON too)

But that makes no difference. I still get 1 matching document. If I put
it in quotes it works:

{
query: {
query_string: {
query: id:3955974 AND message:\welcome-doesnotmatchanything\
}
}
}

It works, meaning it matches 0 documents, since that document does not
contain the doesnotmatchanything token. That's great, but I don't
understand why the unquoted version does not work. This query is being
generated so I can't easily just decide to start quoting it, and I can't
always do that anyway since the user is sometimes going to use wildcards,
which can't be quoted if I want them to function. I was under the
assumption that an EscapedUnquotedString is the same as a quoted unespaced
string (in other words, foo:a\b\c === foo:abc, assuming all special
characters are escaped in the unquoted version).

I'm only on ES 1.01, but I don't see anything new or changes that would
have impacted this behavior in later versions.

Any insights would be helpful! :)

https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
---
Jun Ohtani
blog : http://blog.johtani.info

Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Amish Asthana

Can you run the validate query output. That will be helpful.
amish

On Thursday, November 6, 2014 4:47:12 PM UTC-8, Dave Reed wrote:

I have a document with a field message, that contains the following text
(truncated):

Welcome to test.com!

The assertion field is mapped to have an analyzer that breaks that string
into the following tokens:

welcome
to
test
com

But, when I search with a query like this:

{
query: {

query_string: {
query: id:3955974 AND message:welcome-doesnotmatchanything
}
}
}

I've tried escaping it:

{
query: {
query_string: {
query: id:3955974 AND message:welcome\\-doesnotmatchanything
}
}
}
(note the double escape since it has to be escaped for the JSON too)

But that makes no difference. I still get 1 matching document. If I put it
in quotes it works:

{
query: {
query_string: {
query: id:3955974 AND message:\welcome-doesnotmatchanything\
}
}
}

I'm only on ES 1.01, but I don't see anything new or changes that would
have impacted this behavior in later versions.

Any insights would be helpful! :)

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7790c6fc-5578-4434-9bd2-fd846e59a997%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed

Yes of course :) Here we go:

{
   
   - valid: true
   - _shards: {
  - total: 1
  - successful: 1
  - failed: 0
   }
   - explanations: [
  - {
 - index: index_v1
 - valid: true
 - explanation: message:welcome message:doesnotmatch
  }
   ]

}

It pasted a little weird but that's it.



On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:

 Can you run the validate query output. That will be helpful.
 amish




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/83422fed-2e1c-4e27-825e-5bd9f334f85a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed

Also interesting... if I run the query with explain=true, I see information 
in the details about the welcome token, but there's no mention at all 
about the doesnotmatch token. I guess it wouldn't mention it though, 
since if it did, the document shouldn't match in the first place.

On Monday, November 10, 2014 2:45:05 PM UTC-8, Dave Reed wrote:

 Yes of course :) Here we go:

 {

- valid: true
- _shards: {
   - total: 1
   - successful: 1
   - failed: 0
}
- explanations: [
   - {
  - index: index_v1
  - valid: true
  - explanation: message:welcome message:doesnotmatch
   }
]

 }

 It pasted a little weird but that's it.



 On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:

 Can you run the validate query output. That will be helpful.
 amish




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/632d1e74-31a0-42f2-ad09-40e3030449d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Amish Asthana

I created a test index using your pattern and I am seeing the appropriate
behaviour.
I am assuming you are using the same analyzer for search/query as well as
ensuring that your DEFAULT OPERATOR is AND.
Note that using the welcome-doesnotmatchanything analzyzer will break into
two tokens with OR and your document will match unless you use AND.
amish

On Monday, November 10, 2014 2:48:06 PM UTC-8, Dave Reed wrote:

Also interesting... if I run the query with explain=true, I see
information in the details about the welcome token, but there's no
mention at all about the doesnotmatch token. I guess it wouldn't mention
it though, since if it did, the document shouldn't match in the first place.

On Monday, November 10, 2014 2:45:05 PM UTC-8, Dave Reed wrote:

Yes of course :) Here we go:

{

- valid: true
- _shards: {
- total: 1
- successful: 1
- failed: 0
}
- explanations: [
- {
- index: index_v1
- valid: true
- explanation: message:welcome message:doesnotmatch
}
]

}

It pasted a little weird but that's it.

On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:

Can you run the validate query output. That will be helpful.
amish

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6f17d388-83c9-4d75-8f6f-8af3b4dc954b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed

My default operator doesn't matter if I understand it correctly, because 
I'm specifying the operate explicitly. Also, I can reproduce this behavior 
using a single search term, so there's no operator to speak of. Unless 
you're  saying that the default operator applies to a single term query if 
it is broken into tokens?
 

 Note that using the welcome-doesnotmatchanything analzyzer will break 
 into two tokens with OR and your document will match unless you use AND


This concerns me... my search looks like:

message:welcome-doesnotmatchanything

I cannot break that into an AND. The entire thing is a value provided by 
the end user. You're saying I should on the app side break the string they 
entered into tokens and join them with ANDs? That doesn't seem viable...

Let me back up and say what I'm expecting the user to be able to do. 
There's a single text box where they can enter a search query, with the 
following rules:
1. The user may use a trailing wildcard, e.g. foo*
2. The user may enter multiple terms separated by a space. Only documents 
containing all of the terms will match.
3. The user might enter special characters, such as in battle-axe, simply 
because that is what they think they should search for, which should match 
documents containing battle and axe (the same as a search for battle 
axe).

To that end, I am taking their search string and forming a search like this:

message:searchterm AND...

Where the string is split on spaces and joined with the AND clauses. For 
each individual part of the search phrase, I take care of escaping special 
characters (except * since I am allowing them to use wildcards). For 
example, if they entered foo bar!, I would generate this query:

message:foo AND message:bar\!

The problem is they are entering battle-axe, causing me to generate this:

message:battle\-axe

But that ends up being the same as:

(message:battle OR message:axe)

I guess that is what I was not expecting. Because of this behavior, I have 
to know from my app point of view what tokens I should be splitting the 
original string on, so that I can join them back together with ANDs. But 
that means basically reimplementing the tokenizer on my end, does it not? 
There must be a better way? Like specifying I want those terms to be joined 
with ANDs instead?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/924a04d5-4163-41b5-a7e7-e3ca2982d078%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed

Ok... specifying default_operator: AND worked

In that case, I'd like to say that the docs on that option are incomplete 
or confusing. It says:

The default operator used if no explicit operator is specified. For example, 
with a default operator of OR, the query capital of Hungary is translated 
to capital OR of OR Hungary, and with default operator of AND, the same 
query is translated to capital AND of AND Hungary. The default value is OR.

That's all well and good, but my query does not have multiple terms like 
that. I have a single term for a single field. The default operator is 
applying to the resulting tokens of that, after they are generated by the 
analyzer. I assumed that the default operator applied at the level of the 
query being parsed and that had nothing at all to do with the analyzer. 
Making that clearer could have saved me a lot of time :)

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1a058ca-b179-495a-8b82-e65fece4f99f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Amish Asthana

No I am not saying that . I am saying this :
GET my_index_v1/mytype/_search
{
query: {
query_string: {
default_field: name,
query: welcome-doesnotmatchanything,
default_operator: AND
}
}
}

Here I will not get a match as expected. If I do not specify then OR is the
deafult operator and it will match.
amish

On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:

Note that using the welcome-doesnotmatchanything analzyzer will break
into two tokens with OR and your document will match unless you use AND

This concerns me... my search looks like:

message:welcome-doesnotmatchanything

Let me back up and say what I'm expecting the user to be able to do.
There's a single text box where they can enter a search query, with the
following rules:
1. The user may use a trailing wildcard, e.g. foo*
2. The user may enter multiple terms separated by a space. Only documents
containing all of the terms will match.
3. The user might enter special characters, such as in battle-axe,
simply because that is what they think they should search for, which should
match documents containing battle and axe (the same as a search for
battle axe).

To that end, I am taking their search string and forming a search like
this:

message:searchterm AND...

message:foo AND message:bar\!

The problem is they are entering battle-axe, causing me to generate this:

message:battle\-axe

But that ends up being the same as:

(message:battle OR message:axe)

I guess that is what I was not expecting. Because of this behavior, I have
to know from my app point of view what tokens I should be splitting the
original string on, so that I can join them back together with ANDs. But
that means basically reimplementing the tokenizer on my end, does it not?
There must be a better way? Like specifying I want those terms to be joined
with ANDs instead?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b20d4b80-2ebd-4b5c-a1e5-a434c2d68598%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Dave Reed

Yes, and this was the key, thank you so much. But see my reply above about
the docs on that param being confusing. That was really the source of the
problem for me.

On Monday, November 10, 2014 4:15:05 PM UTC-8, Amish Asthana wrote:

No I am not saying that . I am saying this :
GET my_index_v1/mytype/_search
{
query: {
query_string: {
default_field: name,
query: welcome-doesnotmatchanything,
default_operator: AND
}
}
}

Here I will not get a match as expected. If I do not specify then OR is
the deafult operator and it will match.
amish

On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:

Note that using the welcome-doesnotmatchanything analzyzer will break
into two tokens with OR and your document will match unless you use AND

This concerns me... my search looks like:

message:welcome-doesnotmatchanything

Let me back up and say what I'm expecting the user to be able to do.
There's a single text box where they can enter a search query, with the
following rules:
1. The user may use a trailing wildcard, e.g. foo*
2. The user may enter multiple terms separated by a space. Only documents
containing all of the terms will match.
3. The user might enter special characters, such as in battle-axe,
simply because that is what they think they should search for, which should
match documents containing battle and axe (the same as a search for
battle axe).

To that end, I am taking their search string and forming a search like
this:

message:searchterm AND...

message:foo AND message:bar\!

The problem is they are entering battle-axe, causing me to generate
this:

message:battle\-axe

But that ends up being the same as:

(message:battle OR message:axe)

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query_string search containing a dash has unexpected results

2014-11-07 Thread Jun Ohtani

Hi Dave,

I think the reason is your message field using standard analyzer.
Standard analyzer divide text by -.
If you change analyzer to whitespace analyzer, it matches 0 documents.

_validate API is useful for checking exact query.
Example request:

curl -XGET /YOUR_INDEX/_validate/query?explain -d'
{
query: {
query_string: {
query: id:3955974 AND message:welcome-doesnotmatchanything
}
}
}'

See:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-validate.html#search-validate

I hope that those help you out.

Regards,
Jun

2014-11-07 9:47 GMT+09:00 Dave Reed infinit...@gmail.com:

I have a document with a field message, that contains the following text
(truncated):

Welcome to test.com!

The assertion field is mapped to have an analyzer that breaks that string
into the following tokens:

welcome
to
test
com

But, when I search with a query like this:

{
query: {

query_string: {
query: id:3955974 AND message:welcome-doesnotmatchanything
}
}
}

I've tried escaping it:

{
query: {
query_string: {
query: id:3955974 AND message:welcome\\-doesnotmatchanything
}
}
}
(note the double escape since it has to be escaped for the JSON too)

But that makes no difference. I still get 1 matching document. If I put it
in quotes it works:

{
query: {
query_string: {
query: id:3955974 AND message:\welcome-doesnotmatchanything\
}
}
}

I'm only on ES 1.01, but I don't see anything new or changes that would
have impacted this behavior in later versions.

Any insights would be helpful! :)

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
---
Jun Ohtani
blog : http://blog.johtani.info

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPW8A5zFTiEcT%3D0m%3D-N0ApbfAUBqgMp2hjvmGSJaL1ByLMAAvQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query_string search containing a dash has unexpected results

Re: Query_string search containing a dash has unexpected results

Re: Query_string search containing a dash has unexpected results

Re: Query_string search containing a dash has unexpected results

Re: Query_string search containing a dash has unexpected results

Re: Query_string search containing a dash has unexpected results

Re: Query_string search containing a dash has unexpected results

Re: Query_string search containing a dash has unexpected results

Re: Query_string search containing a dash has unexpected results

Re: Query_string search containing a dash has unexpected results

Re: Query_string search containing a dash has unexpected results

11 matches

Site Navigation

Mail list logo

Footer information