Toke Eskildsen created SOLR-5894:
------------------------------------

             Summary: Speed up high-cardinality facets with sparse counters
                 Key: SOLR-5894
                 URL: https://issues.apache.org/jira/browse/SOLR-5894
             Project: Solr
          Issue Type: Improvement
          Components: SearchComponents - other
    Affects Versions: 4.7, 4.6.1
            Reporter: Toke Eskildsen
            Priority: Minor
             Fix For: 4.6.1


Field based faceting in Solr has two phases: Collecting counts for tags in 
facets and extracting the requested tags.

The execution time for the collecting phase is approximately linear to the 
number of hits and the number of references from hits to tags. This phase is 
not the focus here.

The extraction time scales with the number of unique tags in the search result, 
but is also heavily influenced by the total number of unique tags in the facet 
as every counter, 0 or not, is visited by the extractor (at least for count 
order). For fields with millions of unique tag values this means 10s of 
milliseconds added to the minimum response time (see 
https://sbdevel.wordpress.com/2014/03/18/sparse-facet-counting-on-a-real-index/ 
for a test on a corpus with 7M unique values in the facet).

The extractor needs to visit every counter due to the current counter structure 
being a plain int-array of size #unique_tags. Switching to a sparse structure, 
where only the tag counters > 0 are visited, makes the extraction time linear 
to the number of unique tags in the result set.

Unfortunately the number of unique tags in the result set is unknown at collect 
time, so it is not possible to reliably select sparse counting vs. full 
counting up front. Luckily there exists solutions for sparse sets that has the 
property of switching to non-sparse-mode without a switch-penalty, when the 
sparse-threshold is exceeded (see 
http://programmingpraxis.com/2012/03/09/sparse-sets/ for an example). This JIRA 
aims to implement this functionality in Solr (a proof of concept patch will be 
provided shortly).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to