Hi Marc,

I'm curious what version of Lucene you're using.

Outside that, I can give two pointers.

1. I think you're right to want to look into using DrillSideways for your
use-case. There are some examples in the demo package [1], which
should be helpful.

2. There is a new aggregation engine [2] in Lucene 9.12, in the sandbox
module for now, if you're willing to consider it. It facets at match-time
and is
generally faster than the faceting we had before 9.12.

Stefan

[1]
https://github.com/apache/lucene/tree/main/lucene/demo/src/java/org/apache/lucene/demo/facet
[2] https://github.com/apache/lucene/pull/13568


On Mon, 30 Sept 2024 at 19:26, Marc Davenport
<madavenp...@cargurus.com.invalid> wrote:

> I've been looking at the way our code gets the facet counts from Lucene and
> see if there are some obvious inefficiencies.  We have about 60 normal flat
> facets, some of which are multi-valued, and 5 or so hierarchical and
> multi-valued facets. I'm seeing cases where the call to create a
> FastTaxonomyFacetCounts is taking 1+ seconds when it would be matching on
> 800k documents.  This leads me to believe I've got some implementation
> flaw.  Are there any common errors people make when implementing facets?
> Known trouble spots that I should investigate?
>
> Right now we retrieve the counts for the facets independently from the
> retrieval of matching documents.   Each facet has its own runner which will
> calculate its current counts as well as a more relaxed query state that
> will show its other values.  Different facets will share a cached facet
> collector if they have the same query state.   I know the "hold one out"
> pattern isn't ideal.  I am looking at how we could use the
> drillsideways queries, but I'm not sure I totally understand them.
>
> The FastTaxonomyFacetCounts creation speed is in relation to the number and
> cardinality of the facets on the documents. We pruned off no longer needed
> facets.  Would it make sense to start maintaining more than one Taxonomy
> Index?
>
> I've been looking for any good books or resources to read about lucene.  I
> have the original Lucene in action, which has been helpful in some ways,
> but covers only v3. Many newer concepts are sort of left to java doc, or
> reading through the PRs.   Any suggestions on things to read to better
> understand Lucene and it's proper use?
>
> Thank you,
> Marc
>

Reply via email to