Re: FacetFieldProcessorByArrayDV recomputes ords for each request?

Michael Gibney Thu, 01 Sep 2022 18:50:44 -0700

Yes, as you've found, the ordmap is cached via
SlowCompositeReaderWrapper. It's a bit opaque that that's (iiuc?) the
main acceptable use of SlowCompositeReaderWrapper -- as a wrapper
around OrdinalMaps. I'm pretty sure I remember looking into this, and
the `si.mapping` in `findStartAndEndOrds ` does in fact come via the
cachedOrdMaps in SlowCompositeReaderWrapper.


So I'm surprised you're finding this to be a bottleneck, definitely
worth investigating. If you're using a standalone index and doing jvm
profiling, this issue is probably of limited relevance, but it covers
some similar ground: https://issues.apache.org/jira/browse/SOLR-15008.


On Mon, Aug 29, 2022 at 9:21 AM Dawid Weiss <dawid.we...@gmail.com> wrote:
>
> Digging deeper - hmmm... so there is a cache of ords
> in SlowCompositeReaderWrapper:
>
>   // TODO: consider ConcurrentHashMap ?
>   // TODO: this could really be a weak map somewhere else on the
> coreCacheKey,
>   // but do we really need to optimize slow-wrapper any more?
>   final Map<String,OrdinalMap> cachedOrdMaps = new HashMap<>();
>
> I wonder why this doesn't seem to be used from request to request in my
> case, eh.
>
> Dawid
>
> On Mon, Aug 29, 2022 at 3:07 PM Dawid Weiss <dawid.we...@gmail.com> wrote:
>
> >
> > Hi,
> >
> > I have a situation here with Solr 8.11.2 in stand-alone mode, a large
> > (200GB+) index with multi-valued doc value string fields. The problem is
> > that faceting over these fields takes a long time. Before you say: "well,
> > duh, of course" I wanted to point out that it takes a long time for *every*
> > query, even those that collect facets from a relatively small subset of all
> > documents (say, one hundred).
> >
> > Looking and debugging the code, I see a few things that made my head
> > scratch but this one is particularly troubling.
> >
> > So, the faceting code goes into FacetFieldProcessorByArrayDV and then most
> > of the time is spent inside findStartAndEndOrds, looking basically for the
> > count of unique ordinals (for all segments). Now, because this is done with
> > a slow reader wrapper, it takes forever. And it's repeated for each and
> > every request - even though clearly the ordinal map (or the count of
> > values) won't change for the same reader:
> >
> >   @Override
> >   protected void findStartAndEndOrds() throws IOException {
> >     if (multiValuedField) {
> >       si = FieldUtil.getSortedSetDocValues(fcontext.qcontext, sf, null);
> >       if (si instanceof MultiDocValues.MultiSortedSetDocValues) {
> >         ordinalMap = ((MultiDocValues.MultiSortedSetDocValues)si).mapping;
> >       }
> >
> > If there is any rationale behind not caching the ordinal map (or just the
> > size of all ords!) then I failed to see it. Otherwise it's a serious
> > concern and slowdown that I think could help many poor souls speed up Solr
> > faceting.
> >
> > Anybody familiar with that code who could comment on the above?
> >
> > Dawid
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Re: FacetFieldProcessorByArrayDV recomputes ords for each request?

Reply via email to