Re: How to get Elasticsearch boolean match working for multiple fields

2015-05-08 Thread Allan Mitchell
Dominic

Normal nomenclature is that Field is analyzed and Field.raw is not
analyzed.  Not sure why you would have both as not analyzed given they
would do the same thing, all else being equal

When performing your original query above on fields I know are not_analyzed
I get no results because there are no strings in the fields that match
those terms exactly.

I could of course look to do a regex query

GET /testingindex/mytesttype/_search
{
query: {
bool: {
must: [

 {  regexp : { message : .*Failed password for.* } },
 {  regexp : { path : .*/var/log/secure.* } }

]
}
}
}





On 8 May 2015 at 15:03, Dominic Nicholas dominic.s.nicho...@gmail.com
wrote:

 Hi Alan, I really appreciate the thoughtful response.  One comment before
 I try what you are suggesting... Our path and message fields mappings
 indicate not_analyzed, and we don't want to change them at this point.
 Someone suggested using the .raw versions of the fields (path.raw and
 message.raw, which does work. However, it leaves me with the question : If
 the original field mappings indicate the fields are not_analyzed, why is it
 necessary to use the .raw version ?
 Cheers
 Dom

 On Fri, May 8, 2015 at 6:37 AM, Allan Mitchell casfanal...@gmail.com
 wrote:

 Hi

 Have a look at the below and see if it is what you want.

 DELETE /testingindex

 PUT /testingindex
 {
 settings : {
 number_of_shards : 1
 },
 mappings : {
 mytesttype : {
 _source : { enabled : false },
 properties : {
 message : { type : string, index : analyzed },
 path : {type: string, index: analyzed
 }
 }
 }
 }
 }

 POST /testingindex/mytesttype/1
 {
 message: Failed password for some user or another,
 path:/wrong/path/
 }
 POST /testingindex/mytesttype/2
 {
 message: Not the right message but the right path,
 path:/var/log/secure
 }
 POST /testingindex/mytesttype/3
 {
 message: Failed password for some user or another,
 path:/var/log/secure
 }
 POST /testingindex/mytesttype/4
 {
 message: Nothing is right here,
 path:/wrong/path/too
 }


 GET /testingindex/mytesttype/_search

 GET /testingindex/mytesttype/_search
 {
 query: {
 bool: {
 must: [
  {  match_phrase : { message : Failed password for some
 } },
  {  match_phrase : { path : /var/log/secure } }

 ]
 }
 }
 }

 On 8 May 2015 at 02:07, Dominic Nicholas dominic.s.nicho...@gmail.com
 wrote:

 Hi,

 I need some expert guidance on trying to get a bool match working. I'd
 like the query to only return a successful search result if *both* 'message'
 matches 'Failed password for', *and* 'path' matches '/var/log/secure'.

 This is my query :

 curl -s -XGET 
 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
 filter : { range : { @timestamp : { gte : now-1h } } },
 query : {
 bool : {
 must : [
 {  match_phrase : { message : Failed password for } },
 {  match_phrase : { path: /var/log/secure } }
 ]
 }
 }
 } '

 Here is the start of the output from the search :

 {
   took : 3,
   timed_out : false,
   _shards : {
 total : 5,
 successful : 5,
 failed : 0
   },
   hits : {
 total : 46,
 max_score : 13.308596,
 hits : [ {
   _index : logstash-2015.05.07,
   _type : syslog,
   _id : AU0wzLEqqCKq_IPSp_8k,
   _score : 13.308596,
   _source:{message:May  7 16:53:50 s_local@logstash-02 
 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 
 ssh2,@version:1,@timestamp:2015-05-07T16:53:50.554-07:00,type:syslog,host:logstash-02,path:/var/log/secure}
 }, ...

 The problem is if I change '/var/log/secure' to just 'var' say, and run
 the query, I still get a result, just with a lower score. I understood the
 bool...must construct meant both match terms here would need to be
 successful. What I'm after is *no* result if 'path' doesn't exactly
 match '/var/log/secure'...

 {
   took : 3,
   timed_out : false,
   _shards : {
 total : 5,
 successful : 5,
 failed : 0
   },
   hits : {
 total : 46,
 max_score : 10.354593,
 hits : [ {
   _index : logstash-2015.05.07,
   _type : syslog,
   _id : AU0wzLEqqCKq_IPSp_8k,
   _score : 10.354593,
   _source:{message:May  7 16:53:50 s_local@logstash-02 
 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 
 ssh2,@version:1,@timestamp:2015-05-07T16:53:50.554-07:00,type:syslog,host:logstash-02,path:/var/log/secure}
 },...

 I checked the mappings for these fields to check that they are not
 analyzed :

 curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

 I think these fields are non analyzed and so I believe the search will
 not be analyzed too (based on some 

Re: How to get Elasticsearch boolean match working for multiple fields

2015-05-08 Thread Dominic Nicholas
Hi - thanks again - I was misunderstanding the following :

path : {
type : string,
norms : {
  enabled : false
},
fields : {
  raw : {
type : string,
index : not_analyzed,
ignore_above : 256
  }
}
  }


This is saying that the path is analyzed (default analyzer, and no 'index:
not_analyzed'), but that the field 'raw' is not analyzed. One solution for
me will be to simply use the path.raw field instead of the path field. I'll
also try the regexp. Thanks again for the help!
Dom

On Fri, May 8, 2015 at 10:35 AM, Allan Mitchell casfanal...@gmail.com
wrote:

 Dominic

 Normal nomenclature is that Field is analyzed and Field.raw is not
 analyzed.  Not sure why you would have both as not analyzed given they
 would do the same thing, all else being equal

 When performing your original query above on fields I know are
 not_analyzed I get no results because there are no strings in the fields
 that match those terms exactly.

 I could of course look to do a regex query

 GET /testingindex/mytesttype/_search
 {
 query: {
 bool: {
 must: [

  {  regexp : { message : .*Failed password for.* } },
  {  regexp : { path : .*/var/log/secure.* } }

 ]
 }
 }
 }





 On 8 May 2015 at 15:03, Dominic Nicholas dominic.s.nicho...@gmail.com
 wrote:

 Hi Alan, I really appreciate the thoughtful response.  One comment before
 I try what you are suggesting... Our path and message fields mappings
 indicate not_analyzed, and we don't want to change them at this point.
 Someone suggested using the .raw versions of the fields (path.raw and
 message.raw, which does work. However, it leaves me with the question : If
 the original field mappings indicate the fields are not_analyzed, why is it
 necessary to use the .raw version ?
 Cheers
 Dom

 On Fri, May 8, 2015 at 6:37 AM, Allan Mitchell casfanal...@gmail.com
 wrote:

 Hi

 Have a look at the below and see if it is what you want.

 DELETE /testingindex

 PUT /testingindex
 {
 settings : {
 number_of_shards : 1
 },
 mappings : {
 mytesttype : {
 _source : { enabled : false },
 properties : {
 message : { type : string, index : analyzed },
 path : {type: string, index: analyzed
 }
 }
 }
 }
 }

 POST /testingindex/mytesttype/1
 {
 message: Failed password for some user or another,
 path:/wrong/path/
 }
 POST /testingindex/mytesttype/2
 {
 message: Not the right message but the right path,
 path:/var/log/secure
 }
 POST /testingindex/mytesttype/3
 {
 message: Failed password for some user or another,
 path:/var/log/secure
 }
 POST /testingindex/mytesttype/4
 {
 message: Nothing is right here,
 path:/wrong/path/too
 }


 GET /testingindex/mytesttype/_search

 GET /testingindex/mytesttype/_search
 {
 query: {
 bool: {
 must: [
  {  match_phrase : { message : Failed password for
 some } },
  {  match_phrase : { path : /var/log/secure } }

 ]
 }
 }
 }

 On 8 May 2015 at 02:07, Dominic Nicholas dominic.s.nicho...@gmail.com
 wrote:

 Hi,

 I need some expert guidance on trying to get a bool match working. I'd
 like the query to only return a successful search result if *both* 
 'message'
 matches 'Failed password for', *and* 'path' matches '/var/log/secure'.

 This is my query :

 curl -s -XGET 
 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d 
 '{
 filter : { range : { @timestamp : { gte : now-1h } } },
 query : {
 bool : {
 must : [
 {  match_phrase : { message : Failed password for } 
 },
 {  match_phrase : { path: /var/log/secure } }
 ]
 }
 }
 } '

 Here is the start of the output from the search :

 {
   took : 3,
   timed_out : false,
   _shards : {
 total : 5,
 successful : 5,
 failed : 0
   },
   hits : {
 total : 46,
 max_score : 13.308596,
 hits : [ {
   _index : logstash-2015.05.07,
   _type : syslog,
   _id : AU0wzLEqqCKq_IPSp_8k,
   _score : 13.308596,
   _source:{message:May  7 16:53:50 s_local@logstash-02 
 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 
 ssh2,@version:1,@timestamp:2015-05-07T16:53:50.554-07:00,type:syslog,host:logstash-02,path:/var/log/secure}
 }, ...

 The problem is if I change '/var/log/secure' to just 'var' say, and run
 the query, I still get a result, just with a lower score. I understood the
 bool...must construct meant both match terms here would need to be
 successful. What I'm after is *no* result if 'path' doesn't exactly
 match '/var/log/secure'...

 {
   took : 3,
   timed_out : false,
   _shards : {
 total : 5,
 successful : 5,
 failed : 0
   },
   hits : {
 total : 46,

How to get Elasticsearch boolean match working for multiple fields

2015-05-07 Thread Dominic Nicholas


Hi,

I need some expert guidance on trying to get a bool match working. I'd like 
the query to only return a successful search result if *both* 'message' 
matches 'Failed password for', *and* 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 
'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
filter : { range : { @timestamp : { gte : now-1h } } },
query : {
bool : {
must : [
{  match_phrase : { message : Failed password for } },
{  match_phrase : { path: /var/log/secure } }
]
}
}
} '

Here is the start of the output from the search :

{
  took : 3,
  timed_out : false,
  _shards : {
total : 5,
successful : 5,
failed : 0
  },
  hits : {
total : 46,
max_score : 13.308596,
hits : [ {
  _index : logstash-2015.05.07,
  _type : syslog,
  _id : AU0wzLEqqCKq_IPSp_8k,
  _score : 13.308596,
  _source:{message:May  7 16:53:50 s_local@logstash-02 sshd[17970]: 
Failed password for fred from 172.28.111.200 port 43487 
ssh2,@version:1,@timestamp:2015-05-07T16:53:50.554-07:00,type:syslog,host:logstash-02,path:/var/log/secure}
}, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run the 
query, I still get a result, just with a lower score. I understood the 
bool...must construct meant both match terms here would need to be 
successful. What I'm after is *no* result if 'path' doesn't exactly match 
'/var/log/secure'...

{
  took : 3,
  timed_out : false,
  _shards : {
total : 5,
successful : 5,
failed : 0
  },
  hits : {
total : 46,
max_score : 10.354593,
hits : [ {
  _index : logstash-2015.05.07,
  _type : syslog,
  _id : AU0wzLEqqCKq_IPSp_8k,
  _score : 10.354593,
  _source:{message:May  7 16:53:50 s_local@logstash-02 sshd[17970]: 
Failed password for fred from 172.28.111.200 port 43487 
ssh2,@version:1,@timestamp:2015-05-07T16:53:50.554-07:00,type:syslog,host:logstash-02,path:/var/log/secure}
},...

I checked the mappings for these fields to check that they are not analyzed 
:

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will not 
be analyzed too (based on some training documentation I read recently from 
elasticsearch). Here is a snippet of the output _mapping for this index 
below.

  
  message : {
type : string,
norms : {
  enabled : false
},
fields : {
  raw : {
type : string,
index : not_analyzed,
ignore_above : 256
  }
}
  },
  path : {
type : string,
norms : {
  enabled : false
},
fields : {
  raw : {
type : string,
index : not_analyzed,
ignore_above : 256
  }
}
  },
  

Where am I going wrong (in a bunch of places I'm sure), what am I 
misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

-- 
Please update your bookmarks! We moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to get Elasticsearch boolean match working for multiple fields

2015-05-07 Thread Jason Wee
what es version is that?

On Fri, May 8, 2015 at 9:07 AM, Dominic Nicholas 
dominic.s.nicho...@gmail.com wrote:

 Hi,

 I need some expert guidance on trying to get a bool match working. I'd
 like the query to only return a successful search result if *both* 'message'
 matches 'Failed password for', *and* 'path' matches '/var/log/secure'.

 This is my query :

 curl -s -XGET 
 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
 filter : { range : { @timestamp : { gte : now-1h } } },
 query : {
 bool : {
 must : [
 {  match_phrase : { message : Failed password for } },
 {  match_phrase : { path: /var/log/secure } }
 ]
 }
 }
 } '

 Here is the start of the output from the search :

 {
   took : 3,
   timed_out : false,
   _shards : {
 total : 5,
 successful : 5,
 failed : 0
   },
   hits : {
 total : 46,
 max_score : 13.308596,
 hits : [ {
   _index : logstash-2015.05.07,
   _type : syslog,
   _id : AU0wzLEqqCKq_IPSp_8k,
   _score : 13.308596,
   _source:{message:May  7 16:53:50 s_local@logstash-02 sshd[17970]: 
 Failed password for fred from 172.28.111.200 port 43487 
 ssh2,@version:1,@timestamp:2015-05-07T16:53:50.554-07:00,type:syslog,host:logstash-02,path:/var/log/secure}
 }, ...

 The problem is if I change '/var/log/secure' to just 'var' say, and run
 the query, I still get a result, just with a lower score. I understood the
 bool...must construct meant both match terms here would need to be
 successful. What I'm after is *no* result if 'path' doesn't exactly match
 '/var/log/secure'...

 {
   took : 3,
   timed_out : false,
   _shards : {
 total : 5,
 successful : 5,
 failed : 0
   },
   hits : {
 total : 46,
 max_score : 10.354593,
 hits : [ {
   _index : logstash-2015.05.07,
   _type : syslog,
   _id : AU0wzLEqqCKq_IPSp_8k,
   _score : 10.354593,
   _source:{message:May  7 16:53:50 s_local@logstash-02 sshd[17970]: 
 Failed password for fred from 172.28.111.200 port 43487 
 ssh2,@version:1,@timestamp:2015-05-07T16:53:50.554-07:00,type:syslog,host:logstash-02,path:/var/log/secure}
 },...

 I checked the mappings for these fields to check that they are not
 analyzed :

 curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

 I think these fields are non analyzed and so I believe the search will not
 be analyzed too (based on some training documentation I read recently from
 elasticsearch). Here is a snippet of the output _mapping for this index
 below.

   
   message : {
 type : string,
 norms : {
   enabled : false
 },
 fields : {
   raw : {
 type : string,
 index : not_analyzed,
 ignore_above : 256
   }
 }
   },
   path : {
 type : string,
 norms : {
   enabled : false
 },
 fields : {
   raw : {
 type : string,
 index : not_analyzed,
 ignore_above : 256
   }
 }
   },
   

 Where am I going wrong (in a bunch of places I'm sure), what am I
 misunderstanding here (probably a lot!) ?

 Any help would be much appreciated!

 Thanks

 --
 Please update your bookmarks! We moved to https://discuss.elastic.co/
 ---
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
Please update your bookmarks! We moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHO4itwspZ96axDfyoLavndj2wzS_%2BV-UJha%2B893F5nzp%3DZYPA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.