Re: slow filter execution

2014-07-31 Thread Kireet Reddy
Quick update, I found that if I explicitly set _cache to true, things seem 
to work more as expected, i.e. subsequent executions of the query sped up. 
I looked at DateFieldMapper.rangeFilter() and to me it looks like if a 
number is passed, caching will be disabled unless it's explicitly set to 
true. Not sure if this has been fixed in 1.3.x yet or not. This meshes with 
my observed behavior. 

On Wednesday, July 30, 2014 8:59:37 AM UTC-7, Kireet Reddy wrote:

 Thanks for the detailed reply. 

 I am a bit confused about and vs bool filter execution. I read this post 
 http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/ 
 on 
 the elasticsearch blog. From that, I thought the bool filter would work by 
 basically creating a bitset for the entire segment(s) being examined. If 
 the filter value changes every time, will this still be cheaper than an AND 
 filter that will just examine the matching docs? My segments can be very 
 big and this query for example on matched one document.

 There is no match_all query filter, There is a match query filter on a 
 field named all. :)

 Based on your feedback, I moved all filters, including the query filter, 
 into the bool filter. However it didn't change things: the query takes an 
 order of magnitude slower with the range filter, unless I set execution to 
 fielddata. I am using 1.2.2, I tried the strategy anyways and it didn't 
 make a difference.

 {
 query: {
 filtered: {
 query: {
 match_all: {}
 },
 filter: {
 bool: {
 must: [
 {
 terms: {
 source_id: [s1, s2, s3]
 }
 },
 {
 query: {
 match: {
 all: {
 query: foo
 }
 }
 }
 },
 {
 range: {
 published: {
 to: 1406064191883
 }
 }
 }
 ]
 }
 }
 }
 },
 sort: [
 {
 crawlDate: {
 order: desc
 }
 }
 ]
 }

 On Wednesday, July 30, 2014 4:30:10 AM UTC-7, Clinton Gormley wrote:

 Don't use the `and` filter - use the `bool` filter instead.  They have 
 different execution modes and the `bool` filter works best with bitset 
 filters (but also knows how to handle non-bitset filters like geo etc).  

 Just remove the `and`, `or` and `not` filters from your DSL vocabulary.

 Also, not sure why you are ANDing with a match_all filter - that doesn't 
 make much sense.

 Depending on which version of ES you're using, you may be encountering a 
 bug in the filtered query which ended up always running the query first, 
 instead of the filter. This was fixed in v1.2.0 
 https://github.com/elasticsearch/elasticsearch/issues/6247 .  If you are 
 on an earlier version you can force filter-first execution manually by 
 specifying a strategy of random_access_100.  See 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_filter_strategy

 In summary, (and taking your less granular datetime clause into account) 
 your query would be better written as:

 GET /_search
 {
   query: {
 filtered: {
   strategy: random_access_100,   pre 1.2 only
   filter: {
 bool: {
   must: [
 {
   terms: {
 source_id: [ s1, s2, s3 ]
   }
 },
 {
   range: {
 published: {
   gte: now-1d/d   coarse grained, cached
 }
   }
 },
 {
   range: {
 published: {
   gte: now-30m  fine grained, not cached, 
 could use fielddata too
 },
 _cache: false
   }
 }
   ]
 }
   }
 }
   }
 }





 On 30 July 2014 10:55, David Pilato da...@pilato.fr wrote:

 May be a stupid question: why did you put that filter inside a query and 
 not within the same filter you have at the end?


 For my test case it's the same every time. In the real query it will 
 change every time, but I planned to not cache this filter and have a less 
 granular date filter in the bool filter that would be cached. However 
 while 
 debugging I 

Re: slow filter execution

2014-07-31 Thread Clinton Gormley
On 31 July 2014 20:25, Kireet Reddy kir...@feedly.com wrote:

 Quick update, I found that if I explicitly set _cache to true, things seem
 to work more as expected, i.e. subsequent executions of the query sped up.
 I looked at DateFieldMapper.rangeFilter() and to me it looks like if a
 number is passed, caching will be disabled unless it's explicitly set to
 true. Not sure if this has been fixed in 1.3.x yet or not. This meshes with
 my observed behavior.


Nice catch!!!

That's a notable bug!  Opened here:
https://github.com/elasticsearch/elasticsearch/issues/7114

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSuS6f28kmXT_b3LFvCZJG1-_ui2D%3Drf-rojn4x6Mf%2Brw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: slow filter execution

2014-07-30 Thread Kireet Reddy
For my test case it's the same every time. In the real query it will
change every time, but I planned to not cache this filter and have a less
granular date filter in the bool filter that would be cached. However while
debugging I noticed slowness with the date range filters even while testing
with the same value repeatedly.
On Jul 29, 2014 10:49 PM, David Pilato da...@pilato.fr wrote:

 Any chance your filter value changes for every call?
 Or are you using exactly the same value each time?

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 30 juil. 2014 à 05:03, Kireet Reddy kir...@feedly.com a écrit :

 One of my queries has been consistently taking 500ms-1s and I can't figure
 out why. Here is the query
 https://gist.github.com/anonymous/d98fb2c46d9a7755e882 (it looks a bit
 strange as I have removed things that didn't seem to affect execution
 time). When I remove the range filter, the query consistently takes  10ms.
 The query itself only results 1 hit with or without the range filter, so I
 am not sure why simply including this filter adds so much time. My nodes
 are not experiencing any filter cache evictions. I also tried moving it to
 the bool section with no luck. Changing execution to fielddata does
 improve execution time to  10ms though. Since I am sorting on the same
 field, I suppose this should be fine. But I would like to understand why
 the slowdown occurs. The published field is a date type and has eager field
 data loading enabled.

 Thanks
 Kireet


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/994f4700-7a52-4db4-a2a7-d252732517bd%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/994f4700-7a52-4db4-a2a7-d252732517bd%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

 --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/N0z5eZRPO2A/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CE4B26B8-5837-46C5-9E89-2AFBADED9BB6%40pilato.fr
 https://groups.google.com/d/msgid/elasticsearch/CE4B26B8-5837-46C5-9E89-2AFBADED9BB6%40pilato.fr?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACkKG4iMwtd-i_NE2mWM6Ce3WeEGM_cpsJXzFsdOUc5n_PTU-A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: slow filter execution

2014-07-30 Thread David Pilato
May be a stupid question: why did you put that filter inside a query and 
not within the same filter you have at the end?

For my test case it's the same every time. In the real query it will 
 change every time, but I planned to not cache this filter and have a less 
 granular date filter in the bool filter that would be cached. However while 
 debugging I noticed slowness with the date range filters even while testing 
 with the same value repeatedly.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/af76ca41-9045-4a4f-b82c-b9c86d964ace%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: slow filter execution

2014-07-30 Thread Clinton Gormley
Don't use the `and` filter - use the `bool` filter instead.  They have
different execution modes and the `bool` filter works best with bitset
filters (but also knows how to handle non-bitset filters like geo etc).

Just remove the `and`, `or` and `not` filters from your DSL vocabulary.

Also, not sure why you are ANDing with a match_all filter - that doesn't
make much sense.

Depending on which version of ES you're using, you may be encountering a
bug in the filtered query which ended up always running the query first,
instead of the filter. This was fixed in v1.2.0
https://github.com/elasticsearch/elasticsearch/issues/6247 .  If you are on
an earlier version you can force filter-first execution manually by
specifying a strategy of random_access_100.  See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_filter_strategy

In summary, (and taking your less granular datetime clause into account)
your query would be better written as:

GET /_search
{
  query: {
filtered: {
  strategy: random_access_100,   pre 1.2 only
  filter: {
bool: {
  must: [
{
  terms: {
source_id: [ s1, s2, s3 ]
  }
},
{
  range: {
published: {
  gte: now-1d/d   coarse grained, cached
}
  }
},
{
  range: {
published: {
  gte: now-30m  fine grained, not cached, could
use fielddata too
},
_cache: false
  }
}
  ]
}
  }
}
  }
}





On 30 July 2014 10:55, David Pilato da...@pilato.fr wrote:

 May be a stupid question: why did you put that filter inside a query and
 not within the same filter you have at the end?


 For my test case it's the same every time. In the real query it will
 change every time, but I planned to not cache this filter and have a less
 granular date filter in the bool filter that would be cached. However while
 debugging I noticed slowness with the date range filters even while testing
 with the same value repeatedly.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/af76ca41-9045-4a4f-b82c-b9c86d964ace%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/af76ca41-9045-4a4f-b82c-b9c86d964ace%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKRQ6tyciPDVKVnCz0nzgq9B89y6irh3N1Ergf-oCW2Z%2Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: slow filter execution

2014-07-30 Thread Kireet Reddy
Thanks for the detailed reply. 

I am a bit confused about and vs bool filter execution. I read this post 
http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/ on 
the elasticsearch blog. From that, I thought the bool filter would work by 
basically creating a bitset for the entire segment(s) being examined. If 
the filter value changes every time, will this still be cheaper than an AND 
filter that will just examine the matching docs? My segments can be very 
big and this query for example on matched one document.

There is no match_all query filter, There is a match query filter on a 
field named all. :)

Based on your feedback, I moved all filters, including the query filter, 
into the bool filter. However it didn't change things: the query takes an 
order of magnitude slower with the range filter, unless I set execution to 
fielddata. I am using 1.2.2, I tried the strategy anyways and it didn't 
make a difference.

{
query: {
filtered: {
query: {
match_all: {}
},
filter: {
bool: {
must: [
{
terms: {
source_id: [s1, s2, s3]
}
},
{
query: {
match: {
all: {
query: foo
}
}
}
},
{
range: {
published: {
to: 1406064191883
}
}
}
]
}
}
}
},
sort: [
{
crawlDate: {
order: desc
}
}
]
}

On Wednesday, July 30, 2014 4:30:10 AM UTC-7, Clinton Gormley wrote:

 Don't use the `and` filter - use the `bool` filter instead.  They have 
 different execution modes and the `bool` filter works best with bitset 
 filters (but also knows how to handle non-bitset filters like geo etc).  

 Just remove the `and`, `or` and `not` filters from your DSL vocabulary.

 Also, not sure why you are ANDing with a match_all filter - that doesn't 
 make much sense.

 Depending on which version of ES you're using, you may be encountering a 
 bug in the filtered query which ended up always running the query first, 
 instead of the filter. This was fixed in v1.2.0 
 https://github.com/elasticsearch/elasticsearch/issues/6247 .  If you are 
 on an earlier version you can force filter-first execution manually by 
 specifying a strategy of random_access_100.  See 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_filter_strategy

 In summary, (and taking your less granular datetime clause into account) 
 your query would be better written as:

 GET /_search
 {
   query: {
 filtered: {
   strategy: random_access_100,   pre 1.2 only
   filter: {
 bool: {
   must: [
 {
   terms: {
 source_id: [ s1, s2, s3 ]
   }
 },
 {
   range: {
 published: {
   gte: now-1d/d   coarse grained, cached
 }
   }
 },
 {
   range: {
 published: {
   gte: now-30m  fine grained, not cached, 
 could use fielddata too
 },
 _cache: false
   }
 }
   ]
 }
   }
 }
   }
 }





 On 30 July 2014 10:55, David Pilato da...@pilato.fr javascript: wrote:

 May be a stupid question: why did you put that filter inside a query and 
 not within the same filter you have at the end?


 For my test case it's the same every time. In the real query it will 
 change every time, but I planned to not cache this filter and have a less 
 granular date filter in the bool filter that would be cached. However while 
 debugging I noticed slowness with the date range filters even while testing 
 with the same value repeatedly.

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/af76ca41-9045-4a4f-b82c-b9c86d964ace%40googlegroups.com
  
 

slow filter execution

2014-07-29 Thread Kireet Reddy
One of my queries has been consistently taking 500ms-1s and I can't figure 
out why. Here is the query 
https://gist.github.com/anonymous/d98fb2c46d9a7755e882 (it looks a bit 
strange as I have removed things that didn't seem to affect execution 
time). When I remove the range filter, the query consistently takes  10ms. 
The query itself only results 1 hit with or without the range filter, so I 
am not sure why simply including this filter adds so much time. My nodes 
are not experiencing any filter cache evictions. I also tried moving it to 
the bool section with no luck. Changing execution to fielddata does 
improve execution time to  10ms though. Since I am sorting on the same 
field, I suppose this should be fine. But I would like to understand why 
the slowdown occurs. The published field is a date type and has eager field 
data loading enabled.

Thanks
Kireet


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/994f4700-7a52-4db4-a2a7-d252732517bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: slow filter execution

2014-07-29 Thread David Pilato
Any chance your filter value changes for every call?
Or are you using exactly the same value each time?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 30 juil. 2014 à 05:03, Kireet Reddy kir...@feedly.com a écrit :

One of my queries has been consistently taking 500ms-1s and I can't figure out 
why. Here is the query (it looks a bit strange as I have removed things that 
didn't seem to affect execution time). When I remove the range filter, the 
query consistently takes  10ms. The query itself only results 1 hit with or 
without the range filter, so I am not sure why simply including this filter 
adds so much time. My nodes are not experiencing any filter cache evictions. I 
also tried moving it to the bool section with no luck. Changing execution to 
fielddata does improve execution time to  10ms though. Since I am sorting on 
the same field, I suppose this should be fine. But I would like to understand 
why the slowdown occurs. The published field is a date type and has eager field 
data loading enabled.

Thanks
Kireet


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/994f4700-7a52-4db4-a2a7-d252732517bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CE4B26B8-5837-46C5-9E89-2AFBADED9BB6%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.