Strange bug when we enable faceting

Greg Bowyer Wed, 02 Nov 2011 11:18:28 -0700

When I enable faceting in SOLR for some reason our incoming user queriesstart becoming cached in the filter cache, this very quickly leads theinstance to run out of memory; we could lower the size of thefiltercache, but I feel this is a band-aid around a far odder problem.

I have been investigating the heap-dumps that were created on ourinstances when we ran out of memory, these dumps show (unless yourkit isbeing dishonest) that the filter-cache containsBoostedQueries(BooleanQueries(DisjunctionMaxQueries))) objects, each ofwhich contains terms objects that I would not expect to see in thefilterCache.


A snapshot of the object graph can be seen here.
http://gbowyer.freeshell.org/filter-cache2.html

In terms of our index, queries and setup; have a solr 3.3 setup withsharding, we have nodes that act as aggregators with the rest acting asslaves or shards. As per recommendations, the aggregators act asdispatchers for searches, but do not themselves surface any index data.

Most of our search queries differ on the search terms but generally havethe following form:

path=/aggregator/params={fl=docid,pid,score&start=0&q=dat+data+cartridge&fq=+parent_cids:438&fq=+dtype:(1+OR+2)&rows=20

path=/selectparams={fl=docid,score&start=0&q=polyethylene+bench+storage&enable=true&isShard=true&wt=javabin&fq=+rev_type:[1+TO+2]&fq=+parent_cids:25000500&fq=+dtype:(1+OR+2)&fsv=true&rows=20&version=2


Breaking this down, the fqs defined are against three fields:

* parent_cids - This field contains roughly 1394 terms, there are afewpermutations for this field, but I would expect nomore than

                    at most ~10000 fqs for this field

* dtype - This field has 2 terms, and we only ever query it asshown above,its reserved for some future work and would at most onlyever have

              8 terms

    * rev_type - Similer to dtype, we only have 3 terms in this field

All of our filters are not generally user accessible, and we ensure thatclients alway provide filter queries in the same order to remove theduplication of fq's (that is, we go to some length to avoid things likefq=+dtype(2+OR+1) appearing since we already cache fq=+dtype(1+OR+2)).


Our search handler is defined with some basic parameters as follows

---- %< ----
<requestHandler name="search" class="solr.SearchHandler" default="true">
<!-- default values for query parameters can be specified, these
    will be overridden by parameters in the request
   -->
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="qf">title^1.0 descr^0.5 mft^0.5 brand^0.5</str>
<str name="pf">title^3 descr^0.5</str>
<str name="boost">product(redir,bid)</str>
<str name="ps">4</str>
<str name="mm">50%</str>
<str name="defType">edismax</str>
<int name="rows">20</int>
<str name="facet">true</str>
<str name="facet.field">price_bucket</str>
<str name="facet.price_bucket.sort">count</str>
<str name="facet.price_bucket.mincount">1</str>
<str name="facet.price_bucket.limit">100</str>
<str name="facet.mincount">1</str>
</lst>
</requestHandler>
---- >% ----

price_bucket is a field that we deduce at index time, it takes a fieldwe store called price and creates a term that reflects a range (orbucket) of prices that the given document falls into. I did originallyattempt to use facet counts directly but found that the instance faileddue to running out of memory; at the time it was assumed that our rangeof prices and the granularity of our "buckets" were creating too manyfilter queries. for reference there are 239 unique terms in theprice_bucket field.

At present our installation, indexing practices and queries are veryvanilla, we are doing nothing esoteric out of the box.

This is a fairly undesirable issue as it means that our filter-cacherapidly fills rapidly, with cache items that are unlikely to ever berequired again.


Does anyone have any ideas on what could be causing this?

-- Greg Bowyer

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Strange bug when we enable faceting

Reply via email to