On Thu, Dec 16, 2021 at 2:29 PM Robert Muir <[email protected]> wrote:
>
> On Thu, Dec 16, 2021 at 5:05 PM Greg Miller <[email protected]> wrote:
> >
> > On Thu, Dec 16, 2021 at 1:31 PM Robert Muir <[email protected]> wrote:
> > >
> > > On Thu, Dec 16, 2021 at 3:53 PM Greg Miller <[email protected]> wrote:
> > > >
> > >
> > > > TaxonomyReader was recently updated
> > > > to support bulk ordinal resolution (LUCENE-9476), but SSDV faceting is
> > > > stuck looking up paths one-at-a-time via SSDV#lookupOrd(ord). This
> > > > results in a separate TermsEnum#seekExact() call down in
> > > > Lucene90DocValuesProducer for each ordinal being returned.
> > > >
> > >
> > > I'm confused, where do we do gazillions of lookupOrd(), we should not
> > > be doing that. The ordinals should be used for all the heavy-duty
> > > work, and at the very end, only the top-10 or whatever resolved back
> > > to strings with lookupOrd. Think of it kinda like the stored fields :)
> >
> > This is right, but we still need to do the lookup for each value being
> > returned (which is bounded by the top-n param supplied by the user).
> > In getAllDims, we'll do "n" lookups for every dimension indexed. So
> > while we're working in "ordinal space" for doing all the counting and
> > such, there could still be a somewhat sizable number of ordinals that
> > need to be looked up after counting. This is where taxo-faceting leans
> > on bulk lookups.
>
> OK I need to understand this better, because I don't see why it is
> necessary to do it this way. It definitely is very different from the
> way solr wiki page documents hierarchical faceting. Maybe we should
> adopt their approach?

This is separate from adding hierarchical support. I'm probably not
communicating the current state well, but here's where SSDV faceting
does ordinal lookups:
https://github.com/apache/lucene/blob/c64e5fe84c4990968844193e3a62f4ebbba638ea/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L148

So this is done for every returned value, which as you describe,
scales with the requested top-n. For getAllDims, this logic is
executed for every dimension.

I don't think these lookups are avoidable since we provide the path
for each returned value, and in order to get the path, we need to
dereference the ordinal.

>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to