Re: [PR] SOLR-16667: LTR Add feature vector caching for ranking [solr]

via GitHub Fri, 25 Jul 2025 06:32:54 -0700


alessandrobenedetti commented on code in PR #3433:
URL: https://github.com/apache/solr/pull/3433#discussion_r2231064740



##########
solr/modules/ltr/src/java/org/apache/solr/ltr/LTRScoringQuery.java:
##########
@@ -551,16 +554,117 @@ public DocIdSetIterator iterator() {
         return featureTraversalScorer.iterator();
       }
 
-      private class SparseModelScorer extends Scorer {
-        private final DisiPriorityQueue subScorers;
-        private final ScoringQuerySparseIterator itr;
+      public void setIsLogging(boolean isLogging) {
+        this.isLogging = isLogging;
+      }
 
-        private int targetDoc = -1;
-        private int activeDoc = -1;
+      abstract class FeatureTraversalScorer extends Scorer {
+        protected int targetDoc = -1;
+        protected int activeDoc = -1;
+        protected LeafReaderContext leafContext;
 
-        private SparseModelScorer(
-            Weight weight, List<Feature.FeatureWeight.FeatureScorer> 
featureScorers) {
+        protected FeatureTraversalScorer(Weight weight, LeafReaderContext 
leafContext) {
           super(weight);
+          this.leafContext = leafContext;
+        }
+
+        @Override
+        public float score() throws IOException {
+          reset();
+          fillFeaturesInfo();
+          return makeNormalizedFeaturesAndScore();
+        }
+
+        @Override
+        public float getMaxScore(int upTo) throws IOException {
+          return Float.POSITIVE_INFINITY;
+        }
+
+        private void fillFeaturesInfo() throws IOException {

Review Comment:
   not sure this is a 'scorer' responsibility, it feels more like 
'FeatureExtractor' kind of responsibility, please investigate a new class that 
can be used by both the scorer and logger



##########
solr/modules/ltr/src/java/org/apache/solr/ltr/LTRScoringQuery.java:
##########
@@ -551,16 +554,117 @@ public DocIdSetIterator iterator() {
         return featureTraversalScorer.iterator();
       }
 
-      private class SparseModelScorer extends Scorer {
-        private final DisiPriorityQueue subScorers;
-        private final ScoringQuerySparseIterator itr;
+      public void setIsLogging(boolean isLogging) {
+        this.isLogging = isLogging;
+      }
 
-        private int targetDoc = -1;
-        private int activeDoc = -1;
+      abstract class FeatureTraversalScorer extends Scorer {
+        protected int targetDoc = -1;
+        protected int activeDoc = -1;
+        protected LeafReaderContext leafContext;
 
-        private SparseModelScorer(
-            Weight weight, List<Feature.FeatureWeight.FeatureScorer> 
featureScorers) {
+        protected FeatureTraversalScorer(Weight weight, LeafReaderContext 
leafContext) {
           super(weight);
+          this.leafContext = leafContext;
+        }
+
+        @Override
+        public float score() throws IOException {
+          reset();
+          fillFeaturesInfo();
+          return makeNormalizedFeaturesAndScore();
+        }
+
+        @Override
+        public float getMaxScore(int upTo) throws IOException {
+          return Float.POSITIVE_INFINITY;
+        }
+
+        private void fillFeaturesInfo() throws IOException {
+          if (activeDoc == targetDoc) {
+            SolrCache<Integer, float[]> featureVectorCache = null;
+            float[] featureVector;
+
+            // Check added otherwise 
org.apache.solr.ltr.TestLTRScoringQuery.testLTRScoringQuery
+            // and 
org.apache.solr.ltr.TestSelectiveWeightCreation.testScoringQueryWeightCreation
+            // fail

Review Comment:
   definitely need to elaborate here



##########
solr/modules/ltr/src/test-files/solr/collection1/conf/solrconfig-ltr.xml:
##########
@@ -11,7 +11,7 @@
  language governing permissions and limitations under the License. -->
 
 <config>
-    <luceneMatchVersion>${tests.luceneMatchVersion:LATEST}</luceneMatchVersion>
+ <luceneMatchVersion>${tests.luceneMatchVersion:LATEST}</luceneMatchVersion>

Review Comment:
   do we need this change?



##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -716,6 +727,11 @@ public <K, V> boolean regenerateItem(
           });
     }
 
+    if (solrConfig.featureVectorCacheConfig != null
+        && solrConfig.featureVectorCacheConfig.getRegenerator() == null) {
+      solrConfig.featureVectorCacheConfig.setRegenerator(new 
NoOpRegenerator());

Review Comment:
   With noOpRegenerator, we don't have autowarming, do we?
   Any motivation?



##########
solr/modules/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java:
##########
@@ -423,16 +420,13 @@ private void implTransform(SolrDocument doc, int docid, 
DocIterationInfo docInfo
         }
       }
       if (!(rerankingQuery instanceof OriginalRankingLTRScoringQuery) || 
hasExplicitFeatureStore) {
-        Object featureVector = featureLogger.getFeatureVector(docid, 
rerankingQuery, searcher);
-        if (featureVector == null) { // FV for this document was not in the 
cache
-          featureVector =
-              featureLogger.makeFeatureVector(
-                  LTRRescorer.extractFeaturesInfo(
-                      rerankingModelWeight,
-                      docid,
-                      (!docsWereReranked && docsHaveScores) ? docInfo.score() 
: null,
-                      leafContexts));
-        }
+        String featureVector =
+            featureLogger.printFeatureVector(
+                LTRRescorer.extractFeaturesInfo(

Review Comment:
   mmmm extractFeatureInfo could actually just read from cache?
   maybe we need to change the name then to something like
   'fetchCacheableFeatureVector' or something similar



##########
solr/modules/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java:
##########
@@ -75,11 +75,10 @@ public class LTRFeatureLoggerTransformerFactory extends 
TransformerFactory {
   // used inside fl to specify to log (all|model only) features
   private static final String FV_LOG_ALL = "logAll";
 
-  private static final String DEFAULT_LOGGING_MODEL_NAME = "logging-model";
+  public static final String DEFAULT_LOGGING_MODEL_NAME = "logging-model";

Review Comment:
   do we need this public? maybe package level? maybe we need to refactor the 
packages a bit



##########
solr/modules/ltr/src/java/org/apache/solr/ltr/LTRScoringQuery.java:
##########
@@ -368,17 +376,7 @@ public class ModelWeight extends Weight {
     private final float[] modelFeatureValuesNormalized;
     private final Feature.FeatureWeight[] extractedFeatureWeights;
 
-    // List of all the feature names, values - used for both scoring and 
logging
-    /*
-     *  What is the advantage of using a hashmap here instead of an array of 
objects?
-     *     A set of arrays was used earlier and the elements were accessed 
using the featureId.
-     *     With the updated logic to create weights selectively,
-     *     the number of elements in the array can be fewer than the total 
number of features.
-     *     When [features] are not requested, only the model features are 
extracted.
-     *     In this case, the indexing by featureId, fails. For this reason,
-     *     we need a map which holds just the features that were triggered by 
the documents in the result set.
-     *
-     */
+    // All the features

Review Comment:
   too vague, all the features across stores? Within one feature store? Within 
one model?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] SOLR-16667: LTR Add feature vector caching for ranking [solr]

Reply via email to