[ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
---------------------------------
    Description: 
*Background*

The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

There are some major advantages of using the MultiDocValues rather then a top 
level FieldCache. But the lookup from docId to top-level ordinals is slower 
using MultiDocValues.

My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
to use MultiDocValues, the performance drop is around 100%.  For some use cases 
this performance drop is a blocker.

*What About Faceting?*

String faceting also relies on the top level ordinals. Is faceting performance 
effected also? My testing has shown that the faceting performance is effected 
much less then collapsing. 

One possible reason for this is that field collapsing is memory bound and 
faceting is not. So the additional memory accesses needed for MultiDocValues 
effects field collapsing much more the faceting.

*Proposed Solution*

The proposed solution is to have the default Collapse and Expand algorithm us 
MultiDocValues, but to provide an option to use a top level FieldCache if the 
performance of MultiDocValues is a blocker.

The proposed mechanism for switching to the FieldCache would be a new "hint" 
parameter. If the hint parameter is set to "FAST_QUERY" then the top-level 
FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=FAST_QUERY}
{code}






 







 






  was:
*Background*

The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

There are some major advantages of using the MultiDocValues rather then a top 
level FieldCache. But the lookup from docId to top-level ordinals is slower 
using MultiDocValues.

My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
to use MultiDocValues, the performance drop is around 100%.  For some use cases 
this performance drop is a blocker.

*What About Faceting?*

String faceting also relies on the top level ordinals. Is faceting performance 
effected also? My testing has shown that the faceting performance is effected 
much less then collapsing. 

One possible reason for this is that field collapsing is memory bound and 
faceting is not. So the additional memory accesses needed for MultiDocValues 
effects field collapsing much more the faceting.

*Proposed Solution*

The proposed solution is to have the default Collapse and Expand algorithm us 
MultiDocValues, but to provide an option to use a top level FieldCache if the 
performance of MultiDocValues is a blocker.

The proposed mechanism for switching to the FieldCache would be a new "hint" 
parameter. If the hint parameter is set to "FAST_QUERY" then the top-level 
FieldCache would be used for both Collapse and Expand.

Example syntax:

fq={!collapse field=x hint=FAST_QUERY}







 







 







> Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
> -----------------------------------------------------------
>
>                 Key: SOLR-6581
>                 URL: https://issues.apache.org/jira/browse/SOLR-6581
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: SOLR-6581.patch, SOLR-6581.patch
>
>
> *Background*
> The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
> are optimized to work with a top level FieldCache. Top level FieldCaches have 
> a very fast docID to top-level ordinal lookup. Fast access to the top-level 
> ordinals allows for very high performance field collapsing on high 
> cardinality fields. 
> LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
> FieldCache is no longer in regular use. Instead all top level caches are 
> accessed through MultiDocValues. 
> There are some major advantages of using the MultiDocValues rather then a top 
> level FieldCache. But the lookup from docId to top-level ordinals is slower 
> using MultiDocValues.
> My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
> to use MultiDocValues, the performance drop is around 100%.  For some use 
> cases this performance drop is a blocker.
> *What About Faceting?*
> String faceting also relies on the top level ordinals. Is faceting 
> performance effected also? My testing has shown that the faceting performance 
> is effected much less then collapsing. 
> One possible reason for this is that field collapsing is memory bound and 
> faceting is not. So the additional memory accesses needed for MultiDocValues 
> effects field collapsing much more the faceting.
> *Proposed Solution*
> The proposed solution is to have the default Collapse and Expand algorithm us 
> MultiDocValues, but to provide an option to use a top level FieldCache if the 
> performance of MultiDocValues is a blocker.
> The proposed mechanism for switching to the FieldCache would be a new "hint" 
> parameter. If the hint parameter is set to "FAST_QUERY" then the top-level 
> FieldCache would be used for both Collapse and Expand.
> Example syntax:
> {code}
> fq={!collapse field=x hint=FAST_QUERY}
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to