Hi Marc, I'm curious what version of Lucene you're using.
Outside that, I can give two pointers. 1. I think you're right to want to look into using DrillSideways for your use-case. There are some examples in the demo package [1], which should be helpful. 2. There is a new aggregation engine [2] in Lucene 9.12, in the sandbox module for now, if you're willing to consider it. It facets at match-time and is generally faster than the faceting we had before 9.12. Stefan [1] https://github.com/apache/lucene/tree/main/lucene/demo/src/java/org/apache/lucene/demo/facet [2] https://github.com/apache/lucene/pull/13568 On Mon, 30 Sept 2024 at 19:26, Marc Davenport <madavenp...@cargurus.com.invalid> wrote: > I've been looking at the way our code gets the facet counts from Lucene and > see if there are some obvious inefficiencies. We have about 60 normal flat > facets, some of which are multi-valued, and 5 or so hierarchical and > multi-valued facets. I'm seeing cases where the call to create a > FastTaxonomyFacetCounts is taking 1+ seconds when it would be matching on > 800k documents. This leads me to believe I've got some implementation > flaw. Are there any common errors people make when implementing facets? > Known trouble spots that I should investigate? > > Right now we retrieve the counts for the facets independently from the > retrieval of matching documents. Each facet has its own runner which will > calculate its current counts as well as a more relaxed query state that > will show its other values. Different facets will share a cached facet > collector if they have the same query state. I know the "hold one out" > pattern isn't ideal. I am looking at how we could use the > drillsideways queries, but I'm not sure I totally understand them. > > The FastTaxonomyFacetCounts creation speed is in relation to the number and > cardinality of the facets on the documents. We pruned off no longer needed > facets. Would it make sense to start maintaining more than one Taxonomy > Index? > > I've been looking for any good books or resources to read about lucene. I > have the original Lucene in action, which has been helpful in some ways, > but covers only v3. Many newer concepts are sort of left to java doc, or > reading through the PRs. Any suggestions on things to read to better > understand Lucene and it's proper use? > > Thank you, > Marc >