Toke Eskildsen created SOLR-5894:
------------------------------------
Summary: Speed up high-cardinality facets with sparse counters
Key: SOLR-5894
URL: https://issues.apache.org/jira/browse/SOLR-5894
Project: Solr
Issue Type: Improvement
Components: SearchComponents - other
Affects Versions: 4.7, 4.6.1
Reporter: Toke Eskildsen
Priority: Minor
Fix For: 4.6.1
Field based faceting in Solr has two phases: Collecting counts for tags in
facets and extracting the requested tags.
The execution time for the collecting phase is approximately linear to the
number of hits and the number of references from hits to tags. This phase is
not the focus here.
The extraction time scales with the number of unique tags in the search result,
but is also heavily influenced by the total number of unique tags in the facet
as every counter, 0 or not, is visited by the extractor (at least for count
order). For fields with millions of unique tag values this means 10s of
milliseconds added to the minimum response time (see
https://sbdevel.wordpress.com/2014/03/18/sparse-facet-counting-on-a-real-index/
for a test on a corpus with 7M unique values in the facet).
The extractor needs to visit every counter due to the current counter structure
being a plain int-array of size #unique_tags. Switching to a sparse structure,
where only the tag counters > 0 are visited, makes the extraction time linear
to the number of unique tags in the result set.
Unfortunately the number of unique tags in the result set is unknown at collect
time, so it is not possible to reliably select sparse counting vs. full
counting up front. Luckily there exists solutions for sparse sets that has the
property of switching to non-sparse-mode without a switch-penalty, when the
sparse-threshold is exceeded (see
http://programmingpraxis.com/2012/03/09/sparse-sets/ for an example). This JIRA
aims to implement this functionality in Solr (a proof of concept patch will be
provided shortly).
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]