[
https://issues.apache.org/jira/browse/SOLR-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587695#comment-16587695
]
STANISLAV LIVOTOV commented on SOLR-12688:
------------------------------------------
[~cpoerschke] Can you please take a look at this changes?
> LTR Multiple performance fixes + pure DocValues support for FieldValueFeature
> -----------------------------------------------------------------------------
>
> Key: SOLR-12688
> URL: https://issues.apache.org/jira/browse/SOLR-12688
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: STANISLAV LIVOTOV
> Priority: Major
> Attachments: LTRModelHashCodeAfter.png, LTRModelHashCodeBefore.png,
> LTRSolrFeatureAfter.png, LTRSolrFeatureBefore.png, LTRwithDVOptimisation.png,
> LTRwithoutDVOptimisation.png, MultiplePerformanceFixes.patch
>
>
> This ticket is related to 2 performance and 1 functional/performance issue
> that I had found during integrating LTR in our e-commerce search engine :
> # FieldValueFeature doesn't support pure DocValues fields (Stored false).
> Please also note that for fields which are both stored and DocValues it is
> working not optimal because it is extracting just one field from the stored
> document. DocValues are obviously faster for such circumstances. Below are
> screenshots of JFR profiles without and with new support of DocValues for the
> case when it can be read from DocValues.
> !LTRwithoutDVOptimisation.png!
> !LTRwithDVOptimisation.png!
> # SolrFeature was not optimally implemented for the case when no fq
> parameter was passed. I'm not absolutely sure what was the intention to
> introduce fq parameter for SolrFeature at all, so I decided not to change
> behavior but just optimize described case !LTRSolrFeatureBefore.png!
> !LTRSolrFeatureAfter.png!
> # LTRScoringModel was a mutable object. It was leading to the calculation of
> hashcode on each query, which in turn can consume a lot of time in cases when
> a model is big(In our case we were using LambdaMART with 100 trees and leaves
> which was consuming 3MB of the disk size). So I decided to make
> LTRScoringModel immutable and cache hashCode calculation. Below are the
> screenshots before and after.
> !LTRModelHashCodeBefore.png!!LTRModelHashCodeAfter.png!
> In our case, we had a feature.json file with 8 FieldValueFeatures, 5
> SolrFeatures and 1 OriginalScoreFeature.
> Before introducing the optimizations performance overhead for LTR reranking
> of top 48 documents was 300ms. With all the optimizations in it was decreased
> to 35ms.
> Please also note that JFR screenshots were captured on Solr 6.6 codebase. All
> the numbers are also taken from Solr version 6.6.
> I hope that changes of the DocValues interface(method get() was removed and
> advanceExact was added) won't affect it (At least for DenseNumericDocValues
> it will work as expected.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]