[ https://issues.apache.org/jira/browse/SOLR-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stanislav Livotov updated SOLR-12688: ------------------------------------- Attachment: MultiplePerformanceFixes.patch > LTR Multiple performance fixes + pure DocValues support for FieldValueFeature > ----------------------------------------------------------------------------- > > Key: SOLR-12688 > URL: https://issues.apache.org/jira/browse/SOLR-12688 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Stanislav Livotov > Priority: Major > Attachments: LTRModelHashCodeAfter.png, LTRModelHashCodeBefore.png, > LTRSolrFeatureAfter.png, LTRSolrFeatureBefore.png, LTRwithDVOptimisation.png, > LTRwithoutDVOptimisation.png, MultiplePerformanceFixes.patch > > > This ticket is related to 2 performance and 1 functional/performance issue > that I had found during integrating LTR in our e-commerce search engine : > # FieldValueFeature doesn't support pure DocValues fields (Stored false). > Please also note that for fields which are both stored and DocValues it is > working not optimal because it is extracting just one field from the stored > document. DocValues are obviously faster for such circumstances. Below are > screenshots of JFR profiles without and with new support of DocValues for the > case when it can be read from DocValues. > !LTRwithoutDVOptimisation.png! > !LTRwithDVOptimisation.png! > # SolrFeature was not optimally implemented for the case when no fq > parameter was passed. I'm not absolutely sure what was the intention to > introduce fq parameter for SolrFeature at all, so I decided not to change > behavior but just optimize described case !LTRSolrFeatureBefore.png! > !LTRSolrFeatureAfter.png! > # LTRScoringModel was a mutable object. It was leading to the calculation of > hashcode on each query, which in turn can consume a lot of time in cases when > a model is big(In our case we were using LambdaMART with 100 trees and leaves > which was consuming 3MB of the disk size). So I decided to make > LTRScoringModel immutable and cache hashCode calculation. Below are the > screenshots before and after. > !LTRModelHashCodeBefore.png!!LTRModelHashCodeAfter.png! > In our case, we had a feature.json file with 8 FieldValueFeatures, 5 > SolrFeatures and 1 OriginalScoreFeature. > Before introducing the optimizations performance overhead for LTR reranking > of top 48 documents was 300ms. With all the optimizations in it was decreased > to 35ms. > Please also note that JFR screenshots were captured on Solr 6.6 codebase. All > the numbers are also taken from Solr version 6.6. > I hope that changes of the DocValues interface(method get() was removed and > advanceExact was added) won't affect it (At least for DenseNumericDocValues > it will work as expected.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org