Re: FacetFieldProcessorByArrayDV recomputes ords for each request?

Joel Bernstein Thu, 01 Sep 2022 19:58:11 -0700

The fact that its slow with 100 docs makes me wonder how many values are in
the multi-value field?


I'll load up some docs tomorrow with a multi-value field and see how it
performs locally.





Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Sep 1, 2022 at 9:50 PM Michael Gibney <[email protected]>
wrote:

> Yes, as you've found, the ordmap is cached via
> SlowCompositeReaderWrapper. It's a bit opaque that that's (iiuc?) the
> main acceptable use of SlowCompositeReaderWrapper -- as a wrapper
> around OrdinalMaps. I'm pretty sure I remember looking into this, and
> the `si.mapping` in `findStartAndEndOrds ` does in fact come via the
> cachedOrdMaps in SlowCompositeReaderWrapper.
>
> So I'm surprised you're finding this to be a bottleneck, definitely
> worth investigating. If you're using a standalone index and doing jvm
> profiling, this issue is probably of limited relevance, but it covers
> some similar ground: https://issues.apache.org/jira/browse/SOLR-15008.
>
>
> On Mon, Aug 29, 2022 at 9:21 AM Dawid Weiss <[email protected]> wrote:
> >
> > Digging deeper - hmmm... so there is a cache of ords
> > in SlowCompositeReaderWrapper:
> >
> >   // TODO: consider ConcurrentHashMap ?
> >   // TODO: this could really be a weak map somewhere else on the
> > coreCacheKey,
> >   // but do we really need to optimize slow-wrapper any more?
> >   final Map<String,OrdinalMap> cachedOrdMaps = new HashMap<>();
> >
> > I wonder why this doesn't seem to be used from request to request in my
> > case, eh.
> >
> > Dawid
> >
> > On Mon, Aug 29, 2022 at 3:07 PM Dawid Weiss <[email protected]>
> wrote:
> >
> > >
> > > Hi,
> > >
> > > I have a situation here with Solr 8.11.2 in stand-alone mode, a large
> > > (200GB+) index with multi-valued doc value string fields. The problem
> is
> > > that faceting over these fields takes a long time. Before you say:
> "well,
> > > duh, of course" I wanted to point out that it takes a long time for
> *every*
> > > query, even those that collect facets from a relatively small subset
> of all
> > > documents (say, one hundred).
> > >
> > > Looking and debugging the code, I see a few things that made my head
> > > scratch but this one is particularly troubling.
> > >
> > > So, the faceting code goes into FacetFieldProcessorByArrayDV and then
> most
> > > of the time is spent inside findStartAndEndOrds, looking basically for
> the
> > > count of unique ordinals (for all segments). Now, because this is done
> with
> > > a slow reader wrapper, it takes forever. And it's repeated for each and
> > > every request - even though clearly the ordinal map (or the count of
> > > values) won't change for the same reader:
> > >
> > >   @Override
> > >   protected void findStartAndEndOrds() throws IOException {
> > >     if (multiValuedField) {
> > >       si = FieldUtil.getSortedSetDocValues(fcontext.qcontext, sf,
> null);
> > >       if (si instanceof MultiDocValues.MultiSortedSetDocValues) {
> > >         ordinalMap =
> ((MultiDocValues.MultiSortedSetDocValues)si).mapping;
> > >       }
> > >
> > > If there is any rationale behind not caching the ordinal map (or just
> the
> > > size of all ords!) then I failed to see it. Otherwise it's a serious
> > > concern and slowdown that I think could help many poor souls speed up
> Solr
> > > faceting.
> > >
> > > Anybody familiar with that code who could comment on the above?
> > >
> > > Dawid
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: FacetFieldProcessorByArrayDV recomputes ords for each request?

Reply via email to