FacetFieldProcessorByArrayDV recomputes ords for each request?

Dawid Weiss Mon, 29 Aug 2022 06:08:15 -0700

Hi,

I have a situation here with Solr 8.11.2 in stand-alone mode, a large
(200GB+) index with multi-valued doc value string fields. The problem is
that faceting over these fields takes a long time. Before you say: "well,
duh, of course" I wanted to point out that it takes a long time for *every*
query, even those that collect facets from a relatively small subset of all
documents (say, one hundred).


Looking and debugging the code, I see a few things that made my head
scratch but this one is particularly troubling.

So, the faceting code goes into FacetFieldProcessorByArrayDV and then most
of the time is spent inside findStartAndEndOrds, looking basically for the
count of unique ordinals (for all segments). Now, because this is done with
a slow reader wrapper, it takes forever. And it's repeated for each and
every request - even though clearly the ordinal map (or the count of
values) won't change for the same reader:

  @Override
  protected void findStartAndEndOrds() throws IOException {
    if (multiValuedField) {
      si = FieldUtil.getSortedSetDocValues(fcontext.qcontext, sf, null);
      if (si instanceof MultiDocValues.MultiSortedSetDocValues) {
        ordinalMap = ((MultiDocValues.MultiSortedSetDocValues)si).mapping;
      }

If there is any rationale behind not caching the ordinal map (or just the
size of all ords!) then I failed to see it. Otherwise it's a serious
concern and slowdown that I think could help many poor souls speed up Solr
faceting.

Anybody familiar with that code who could comment on the above?

Dawid

FacetFieldProcessorByArrayDV recomputes ords for each request?

Reply via email to