Facet Count strategies and common errors

Marc Davenport Mon, 30 Sep 2024 11:25:58 -0700

I've been looking at the way our code gets the facet counts from Lucene and
see if there are some obvious inefficiencies.  We have about 60 normal flat
facets, some of which are multi-valued, and 5 or so hierarchical and
multi-valued facets. I'm seeing cases where the call to create a
FastTaxonomyFacetCounts is taking 1+ seconds when it would be matching on
800k documents.  This leads me to believe I've got some implementation
flaw.  Are there any common errors people make when implementing facets?
Known trouble spots that I should investigate?


Right now we retrieve the counts for the facets independently from the
retrieval of matching documents.   Each facet has its own runner which will
calculate its current counts as well as a more relaxed query state that
will show its other values.  Different facets will share a cached facet
collector if they have the same query state.   I know the "hold one out"
pattern isn't ideal.  I am looking at how we could use the
drillsideways queries, but I'm not sure I totally understand them.

The FastTaxonomyFacetCounts creation speed is in relation to the number and
cardinality of the facets on the documents. We pruned off no longer needed
facets.  Would it make sense to start maintaining more than one Taxonomy
Index?

I've been looking for any good books or resources to read about lucene.  I
have the original Lucene in action, which has been helpful in some ways,
but covers only v3. Many newer concepts are sort of left to java doc, or
reading through the PRs.   Any suggestions on things to read to better
understand Lucene and it's proper use?

Thank you,
Marc

Facet Count strategies and common errors

Reply via email to