Caching filters are implemented in ES, not in Lucene. E.g.
org.elasticsearch,common.lucene.search.CachedFilter is a class that
implements cached filters on the base of Lucene filter class.

The "format" is not only bitsets. The Lucene filter instance is cached, no
matter if it is doc sets or bit sets or whatever. ES code extends Lucene
filters by several methods for fast evaluation and traversal.

ES evaluates the filter in the given filter chain order, from outer to
inner (also called "top down").

When a series of boolean filters (i.e. should/must/must_not) is used, they
can be evaluated efficiently by composition. See
org.elasticsearch,common.lucene.search.XBooleanFilter for the composition
algorithm.

Field data will be loaded when a field is used for operations like filter
or sort. The higher the cardinality, the more effort is needed. This is
because the index is inverted.

Jörg


On Fri, Mar 20, 2015 at 3:30 AM, Ashish Mishra <laughingbud...@gmail.com>
wrote:

> Not sure I understand the difference between composable vs. cacheable.
> Can filters be cached without using bitsets?  What format are the results
> stored in, if not as bitsets?
>
> In the example below, would the string range field "y" filter be evaluated
> on every document in the index, or just on the documents matching the
> previous field "x" filter?
>
> Also, will "y" field data be loaded for all documents in the index, or
> just for the documents matching the previous filter.
>
>
>
> On Thursday, March 19, 2015 at 3:21:12 AM UTC-7, Jörg Prante wrote:
>>
>> There are several concepts:
>>
>> - filter operation (bool, range/geo/script)
>> - filter composition (composable or not, composable means bitsets are
>> used)
>> - filter caching (ES stores filter results or not, if not cached, ES must
>> walk doc-by-doc to apply filter)
>>
>> #1 says you should take care what kind of inner filter the and/or/not
>> filter uses, and then you should arrange filters in the right order to
>> avoid unnecessary complexity
>> #2 most of the filters are cacheable, but not by default. These doc try
>> to explain how the "and" filter consists of inner filter clauses and what
>> is happening because default caching is off. I can not see this is implying
>> bitsets.
>> #3 correct interpretation
>>
>> The use of bitsets is a pointer for composable filters, these
>> should/must/mustnot filters use an internal Lucene bitset implementation
>> for efficient computation.
>>
>> Jörg
>>
>>
>> On Thu, Mar 19, 2015 at 5:58 AM, Ashish Mishra <laughin...@gmail.com>
>> wrote:
>>
>>> I'm trying to optimize filter queries for performance and am slightly
>>> confused by the online docs.  Looking at:
>>>
>>> 1) https://www.elastic.co/blog/all-about-elasticsearch-filter-bitsets
>>> 2) http://www.elastic.co/guide/en/elasticsearch/reference/
>>> current/query-dsl-and-filter.html
>>> 3) http://www.elastic.co/guide/en/elasticsearch/guide/
>>> current/_filter_order.html
>>>
>>> #1 says that Bool filter uses bitsets, while And/Or/Not does doc-by-doc
>>> matching.
>>> #2 says that And result is optionally cacheable (implying that it uses
>>> bitsets).
>>> #3 says that Bool does doc-by-doc matching if the inner filters are not
>>> cacheable.
>>>
>>> This is confusing, is there a clear guideline on when bitsets are used?
>>>
>>> Let's say I have two high-cardinality fields, x and y.  Field data for y
>>> is loaded into memory, while x is not.  What is the optimal way to
>>> structure this query?
>>>
>>>       "filter": {
>>>         "and": [
>>>         {
>>>           "term": {
>>>             "x": "F828477AF7",
>>>     "_cache": false  // Don't want to cache since query will not be
>>> repeated
>>>           }
>>>     },
>>> {
>>>   "range": {
>>>             "y": {
>>>                 "gt": "CB70V63BD8AE  // String range query, should only
>>> be executed on result of previous filters
>>>             }
>>>           }
>>>         }
>>>         ]
>>>       }
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/52dd306b-d229-462b-8b3c-b9cb2fff8c5f%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/52dd306b-d229-462b-8b3c-b9cb2fff8c5f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0dbceece-5c74-4867-90df-951f8f0cae8a%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/0dbceece-5c74-4867-90df-951f8f0cae8a%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEN4zZO75z3_chJKnCSHysSnD0FvnC-Wet1_TGn2ZL5eg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to