[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Michael Gibney (Jira) Wed, 04 Mar 2020 10:29:27 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051505#comment-17051505
 ]


Michael Gibney commented on SOLR-13132:
---------------------------------------

Thanks so much for the review and feedback, [~hossman]! This has been on the 
back burner for me for over a year, but I'm eager to pay some attention to it, 
especially now that I'm able to incorporate (and hopefully address) this 
initial feedback.

# I will separate out the facet cache as an independent PR associated with 
SOLR-13807. It is generally useful independent of any sweep/relatedness work, 
and has such a significant impact on the relatedness work that it might be 
reasonable to treat it as a dependency of this issue.
# I will write tests along the lines of what you've suggested, which will 
hopefully clarify and exercise some of what's going on (you're of course 
correct about the fact that all the caching stuff is a no-op without 
configuring a {{termFacetCache}}, so it is impossible to "rely on existing 
tests"). Among the points I hope to revisit/clarify with testing: regarding 
{{QueryResultKey}}, I believe the change I introduced only treats the main 
query as equivalent to filters for the case where no sort is specified, which 
should actually be ok, I think ({{queryResultsCache}} should always have a sort 
specified? -- in any case that's what I thought a year ago :-)).

I have some questions about exactly how to present the facet cache PR (whether 
to bother making it compatible with SimpleFacets in addition to JSON Facets, 
for one), but I'll ask those in a more deliberate way over at SOLR-13807.

Thanks again for the feedback, and I plan to follow up soon.

> Improve JSON "terms" facet performance when sorted by relatedness 
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132-with-cache-01.patch, 
> SOLR-13132-with-cache.patch, SOLR-13132.patch
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets 
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and 
> set intersection operations increases request latency to the point where 
> relatedness sort may not be usable in practice (for my use case, even after 
> applying the patch for SOLR-13108, for a field with ~220k unique terms per 
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable 
> ~300ms and ~250ms respectively. The approach calculates uninverted facet 
> counts over domain base, foreground, and background docSets in parallel in a 
> single pass. This allows us to take advantage of the efficiencies built into 
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids 
> the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Reply via email to