[ https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joel Bernstein updated SOLR-6581: --------------------------------- Attachment: SOLR-6581.patch Added more error handling and removed all debugging/timing code. > Prepare CollapsingQParserPlugin and ExpandComponent for 5.0 > ----------------------------------------------------------- > > Key: SOLR-6581 > URL: https://issues.apache.org/jira/browse/SOLR-6581 > Project: Solr > Issue Type: Bug > Reporter: Joel Bernstein > Assignee: Joel Bernstein > Priority: Minor > Fix For: 5.0 > > Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, > SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, > SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, > SOLR-6581.patch, SOLR-6581.patch, renames.diff > > > *Background* > The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent > are optimized to work with a top level FieldCache. Top level FieldCaches have > a very fast docID to top-level ordinal lookup. Fast access to the top-level > ordinals allows for very high performance field collapsing on high > cardinality fields. > LUCENE-5666 unified the DocValues and FieldCache api's so that the top level > FieldCache is no longer in regular use. Instead all top level caches are > accessed through MultiDocValues. > There are some major advantages of using the MultiDocValues rather then a top > level FieldCache. But there is one disadvantage, the lookup from docId to > top-level ordinals is slower using MultiDocValues. > My testing has shown that *after optimizing* the CollapsingQParserPlugin code > to use MultiDocValues, the performance drop is around 100%. For some use > cases this performance drop is a blocker. > *What About Faceting?* > String faceting also relies on the top level ordinals. Is faceting > performance affected also? My testing has shown that the faceting performance > is affected much less then collapsing. > One possible reason for this may be that field collapsing is memory bound and > faceting is not. So the additional memory accesses needed for MultiDocValues > affects field collapsing much more then faceting. > *Proposed Solution* > The proposed solution is to have the default Collapse and Expand algorithm > use MultiDocValues, but to provide an option to use a top level FieldCache if > the performance of MultiDocValues is a blocker. > The proposed mechanism for switching to the FieldCache would be a new "hint" > parameter. If the hint parameter is set to "FAST_QUERY" then the top-level > FieldCache would be used for both Collapse and Expand. > Example syntax: > {code} > fq={!collapse field=x hint=FAST_QUERY} > {code} > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org