When I enable faceting in SOLR for some reason our incoming user queries start becoming cached in the filter cache, this very quickly leads the instance to run out of memory; we could lower the size of the filtercache, but I feel this is a band-aid around a far odder problem.

I have been investigating the heap-dumps that were created on our instances when we ran out of memory, these dumps show (unless yourkit is being dishonest) that the filter-cache contains BoostedQueries(BooleanQueries(DisjunctionMaxQueries))) objects, each of which contains terms objects that I would not expect to see in the filterCache.

A snapshot of the object graph can be seen here.
http://gbowyer.freeshell.org/filter-cache2.html

In terms of our index, queries and setup; have a solr 3.3 setup with sharding, we have nodes that act as aggregators with the rest acting as slaves or shards. As per recommendations, the aggregators act as dispatchers for searches, but do not themselves surface any index data.

Most of our search queries differ on the search terms but generally have the following form:

path=/aggregator/ params={fl=docid,pid,score&start=0&q=dat+data+cartridge&fq=+parent_cids:438&fq=+dtype:(1+OR+2)&rows=20

path=/select params={fl=docid,score&start=0&q=polyethylene+bench+storage&enable=true&isShard=true&wt=javabin&fq=+rev_type:[1+TO+2]&fq=+parent_cids:25000500&fq=+dtype:(1+OR+2)&fsv=true&rows=20&version=2

Breaking this down, the fqs defined are against three fields:

* parent_cids - This field contains roughly 1394 terms, there are a few permutations for this field, but I would expect no more than
                    at most ~10000 fqs for this field

* dtype - This field has 2 terms, and we only ever query it as shown above, its reserved for some future work and would at most only ever have
              8 terms

    * rev_type - Similer to dtype, we only have 3 terms in this field

All of our filters are not generally user accessible, and we ensure that clients alway provide filter queries in the same order to remove the duplication of fq's (that is, we go to some length to avoid things like fq=+dtype(2+OR+1) appearing since we already cache fq=+dtype(1+OR+2)).

Our search handler is defined with some basic parameters as follows

---- %< ----
<requestHandler name="search" class="solr.SearchHandler" default="true">
<!-- default values for query parameters can be specified, these
    will be overridden by parameters in the request
   -->
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="qf">title^1.0 descr^0.5 mft^0.5 brand^0.5</str>
<str name="pf">title^3 descr^0.5</str>
<str name="boost">product(redir,bid)</str>
<str name="ps">4</str>
<str name="mm">50%</str>
<str name="defType">edismax</str>
<int name="rows">20</int>
<str name="facet">true</str>
<str name="facet.field">price_bucket</str>
<str name="facet.price_bucket.sort">count</str>
<str name="facet.price_bucket.mincount">1</str>
<str name="facet.price_bucket.limit">100</str>
<str name="facet.mincount">1</str>
</lst>
</requestHandler>
---- >% ----

price_bucket is a field that we deduce at index time, it takes a field we store called price and creates a term that reflects a range (or bucket) of prices that the given document falls into. I did originally attempt to use facet counts directly but found that the instance failed due to running out of memory; at the time it was assumed that our range of prices and the granularity of our "buckets" were creating too many filter queries. for reference there are 239 unique terms in the price_bucket field.

At present our installation, indexing practices and queries are very vanilla, we are doing nothing esoteric out of the box.

This is a fairly undesirable issue as it means that our filter-cache rapidly fills rapidly, with cache items that are unlikely to ever be required again.

Does anyone have any ideas on what could be causing this?

-- Greg Bowyer

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to