[jira] [Commented] (SOLR-12407) edismax boost performance regression from switch to FunctionScoreQuery

Alan Woodward (JIRA) Thu, 14 Jun 2018 01:23:13 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512146#comment-16512146
 ]


Alan Woodward commented on SOLR-12407:
--------------------------------------

I'd missed this, sorry for not picking it up earlier.

It looks as though BoostedQuery just assumed that all docs had a value, so 
avoided the exists check.  I'm not entirely sure that's a safe assumption, but 
from looking at a few ValueSource implementations it does seem that they have 
default return values and so the exists check isn't actually necessary.  So the 
best solution is I think to remove the exists call from 
ValueSource.WrappedDoubleValuesSource and always return true from advanceExact()

> edismax boost performance regression from switch to FunctionScoreQuery
> ----------------------------------------------------------------------
>
>                 Key: SOLR-12407
>                 URL: https://issues.apache.org/jira/browse/SOLR-12407
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.3
>            Reporter: Will Currie
>            Priority: Minor
>         Attachments: restore-boosted-query.patch, solr-7.2.svg, solr-7.3.svg
>
>
> Assertion: FunctionScoreQuery uses the iterator style API (advanceExact + 
> doubleValue). BoostedQuery uses the "old" api (just a single call to 
> doubleValue). In an edismax boost this means the boost function is called 
> twice for every document being scored in 7.3 instead of once in 7.2.
> I'm seeing ~50% increase in query response time after upgrading from 7.2 to 
> 7.3 (600ms to 900ms). My queries use an edismax boost something like:
> {noformat}
> if(termfreq(type,"A"),product(map(field1,3,3,1.5,1),map(field1,4,4,1.9,1),if(def(field2,false),product(map(field1,1,1,0.6,1),map(field1,2,2,0.7,1),if(not(exists(field1)),0.6,1),map(field3,0,0,1.3,1)),product(map(field1,1,1,0.7,1),map(field1,2,2,1.1,1),if(not(exists(field1)),0.90,1),map(field3,0,0,1.50,1)))),1){noformat}
> This boost is likely (surely?) suboptimal but LUCENE-8099 appears to have 
> introduced this performance regression (poured proverbial oil on my 
> smouldering fire). If I change ExtendedDismaxQParser back to using the 
> deprecated BoostedQuery I get the 600ms solr 7.2 response time back.
> It appears FunctionScoreQuery invokes the boost function twice for each 
> document. Once with a call to 
> [exists()|https://github.com/apache/lucene-solr/blob/03afeb7766a39996de3c85e8a6ab24d2a352dd1c/lucene/queries/src/java/org/apache/lucene/queries/function/ValueSource.java#L150]
>  from 
> [advanceExact()|https://github.com/apache/lucene-solr/blob/42154387d4f2a6060da09c4236e2a8dbb575c59e/lucene/queries/src/java/org/apache/lucene/queries/function/FunctionScoreQuery.java#L170],
>  then a second time from the call chain following scores.doubleValue().
> I don't know if that's the cause of the slowdown but I'm definitely seeing a 
> slowdown that disappears when I revert part of LUCENE-8099.
> I've attached some flamegraphs comparing 7.2 and 7.3. The frame 
> FunctionScoreQuery$FunctionScoreWeight$1.score in solr-7.3.svg show 2 
> "towers". One for advanceExact (calling exists()), the other for 
> doubleValue() which ends up similar to solr-7.2.svg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-12407) edismax boost performance regression from switch to FunctionScoreQuery

Reply via email to