Correction. The three facet fields used to initialize the TermGroupFacetCollectors are SortedDVs - not SortedNumericDVs.

On 07/06/2015 04:56 PM, Adam Rosenwald wrote:
Hello all,

When using Lucene 5.X's group facet collectors (i.e. *AbstractGroupFacetCollector* and the provided concrete implementation, *TermGroupFacetCollector*), I repeatedly encounter OOM errors after running a few search requests on an unsharded index consisting of a few million documents. I had experienced the issue in Lucene 5.0.0 and still see it when using 5.2.1.

I've initialized three such collectors to accumulate values over three different facet fields (all SortedNumericDV fields). The collectors all look like the following:

==BEGIN CODE BLOCK==

    AbstractGroupFacetCollector thisFacetCollector =
    TermGroupFacetCollector.createTermGroupFacetCollector(groupField,
                        thisFacetField, facetFieldMultivalued,
    facetPrefix, initialSize);

==END CODE BLOCK==

Note that facetFieldMultivalued = false, facetPrefix = null, and initialSize = 128. There are a few million unique groups indexed in the group field. The heap blows up regardless of the number of unique entries in the facet field (one of the facet fields has, e.g., fewer than 100 unique values).

I have confirmed that the heap ballooning /only/ occurs during collection time (i.e. if I comment out the three TermGroupFacetCollector assignments, I have no OOM issues; even if only one of them is enabled, the heap will eventually face OOM).

Some additional system-related bits. I'm running Lucene 5.2.1 on a dev environment w/ ~8GB heap space w/ 16GB total RAM. I am not using any special codecs. I've confirmed that the indexes (incl. the sidecar facet indexes) get opened only once during initialization of the service. Both the index and sidecar facet index directories are opened as NIOFSDirectory objects. I have also tried MMapDirectory and experience the same problem.

After profiling the heap extensively and after reading the Lucene group faceting source code, I suspect that the DVs (for both the group and facet fields) and/or the arrays used to accumulate facet counts remain memory resident. After executing the same set of queries multiple times, I see heap usage balloon by 1-2GB at a time. I've tried segmenting the index, but while that reduces heap usage for ad-hoc searches, it does not get rid of the OOM issue.

    Any help here would be greatly appreciated.  Many thanks in advance.

--A.

Reply via email to