Re: OOM when using Lucene 5.X's group facet collectors on unsharded index

Adam Rosenwald Mon, 06 Jul 2015 17:25:19 -0700

Correction. The three facet fields used to initialize theTermGroupFacetCollectors are SortedDVs - not SortedNumericDVs.


On 07/06/2015 04:56 PM, Adam Rosenwald wrote:

Hello all,
When using Lucene 5.X's group facet collectors (i.e.*AbstractGroupFacetCollector* and the provided concreteimplementation, *TermGroupFacetCollector*), I repeatedly encounter OOMerrors after running a few search requests on an unsharded indexconsisting of a few million documents. I had experienced the issue inLucene 5.0.0 and still see it when using 5.2.1.
I've initialized three such collectors to accumulate values overthree different facet fields (all SortedNumericDV fields). Thecollectors all look like the following:
==BEGIN CODE BLOCK==

    AbstractGroupFacetCollector thisFacetCollector =
    TermGroupFacetCollector.createTermGroupFacetCollector(groupField,
                        thisFacetField, facetFieldMultivalued,
    facetPrefix, initialSize);

==END CODE BLOCK==
Note that facetFieldMultivalued = false, facetPrefix = null, andinitialSize = 128. There are a few million unique groups indexed inthe group field. The heap blows up regardless of the number of uniqueentries in the facet field (one of the facet fields has, e.g., fewerthan 100 unique values).
I have confirmed that the heap ballooning /only/ occurs duringcollection time (i.e. if I comment out the threeTermGroupFacetCollector assignments, I have no OOM issues; even ifonly one of them is enabled, the heap will eventually face OOM).
Some additional system-related bits. I'm running Lucene 5.2.1 ona dev environment w/ ~8GB heap space w/ 16GB total RAM. I am notusing any special codecs. I've confirmed that the indexes (incl. thesidecar facet indexes) get opened only once during initialization ofthe service. Both the index and sidecar facet index directories areopened as NIOFSDirectory objects. I have also tried MMapDirectory andexperience the same problem.
After profiling the heap extensively and after reading the Lucenegroup faceting source code, I suspect that the DVs (for both thegroup and facet fields) and/or the arrays used to accumulate facetcounts remain memory resident. After executing the same set ofqueries multiple times, I see heap usage balloon by 1-2GB at a time.I've tried segmenting the index, but while that reduces heap usage forad-hoc searches, it does not get rid of the OOM issue.
    Any help here would be greatly appreciated.  Many thanks in advance.

--A.

Re: OOM when using Lucene 5.X's group facet collectors on unsharded index

Reply via email to