[GitHub] [solr] cpoerschke commented on a change in pull request #166: SOLR-15450 add PrefetchingFieldValueFeature

GitBox Wed, 09 Jun 2021 10:51:15 -0700


cpoerschke commented on a change in pull request #166:
URL: https://github.com/apache/solr/pull/166#discussion_r648536798




##########
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/PrefetchingFieldValueFeature.java
##########
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.ltr.feature;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.solr.common.util.StrUtils;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.search.SolrDocumentFetcher;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * This feature returns the value of a field in the current document.
+ * The field must have stored="true" or docValues="true" properties.
+ * Example configuration:
+ * <pre>{
+  "name":  "rawHits",
+  "class": "org.apache.solr.ltr.feature.PrefetchingFieldValueFeature",
+  "params": {
+      "field": "hits"
+  }
+}</pre>
+ */
+public class PrefetchingFieldValueFeature extends FieldValueFeature {
+  // used to store all fields from all PrefetchingFieldValueFeatures
+  private Set<String> prefetchFields;
+  // can be used for debugging to only fetch the field this features uses
+  public static final String DISABLE_PREFETCHING_FIELD_VALUE_FEATURE = 
"disablePrefetchingFieldValueFeature";
+
+  public void setPrefetchFields(Set<String> fields) {
+    prefetchFields = fields;
+  }
+
+  @VisibleForTesting
+  public Set<String> getPrefetchFields(){
+    return prefetchFields;
+  }
+
+  public PrefetchingFieldValueFeature(String name, Map<String,Object> params) {
+    super(name, params);
+  }
+
+  @Override
+  public FeatureWeight createWeight(IndexSearcher searcher, boolean 
needsScores,
+      SolrQueryRequest request, Query originalQuery, Map<String,String[]> efi)
+          throws IOException {
+    return new PrefetchingFieldValueFeatureWeight(searcher, request, 
originalQuery, efi);
+  }
+
+  public class PrefetchingFieldValueFeatureWeight extends 
FieldValueFeatureWeight {
+    private final SolrDocumentFetcher docFetcher;
+    private final Boolean disablePrefetching;
+
+    public PrefetchingFieldValueFeatureWeight(IndexSearcher searcher,
+        SolrQueryRequest request, Query originalQuery, Map<String,String[]> 
efi) {
+      super(searcher, request, originalQuery, efi);
+
+      disablePrefetching = 
request.getParams().getBool(DISABLE_PREFETCHING_FIELD_VALUE_FEATURE, false);
+      // get the searcher directly from the request to be sure that we have a 
SolrIndexSearcher
+      this.docFetcher = request.getSearcher().getDocFetcher();
+    }
+
+    /**
+     * Return a FeatureScorer that works with stored fields and makes use of 
the cache if the configured field is stored
+     * and has no docValues.
+     * Otherwise, delegate the work to the FieldValueFeature.
+     *
+     * @param context the segment this FeatureScorer is working with
+     * @return FeatureScorer for the current segment and field
+     * @throws IOException as defined by abstract class Feature
+     */
+    @Override
+    public FeatureScorer scorer(LeafReaderContext context) throws IOException {
+      if (schemaField != null && !schemaField.stored() && 
schemaField.hasDocValues()) {
+        return super.scorer(context);
+      }
+      return new PrefetchingFieldValueFeatureScorer(this, context,
+          DocIdSetIterator.all(DocIdSetIterator.NO_MORE_DOCS));
+    }
+
+    /**
+     * A FeatureScorer that reads the stored value for a field
+     * docFetcher does not request a single field but all the prefetchFields 
to improve performance through caching
+     */
+    public class PrefetchingFieldValueFeatureScorer extends 
FieldValueFeatureScorer {
+
+      public PrefetchingFieldValueFeatureScorer(FeatureWeight weight,
+          LeafReaderContext context, DocIdSetIterator itr) {
+        super(weight, context, itr);
+      }
+
+      @Override
+      public float score() throws IOException {
+        try {
+          final Document document;
+          if(disablePrefetching) {
+            document = docFetcher.doc(context.docBase + itr.docID(), 
getFieldAsSet());
+          } else {
+            document = docFetcher.doc(context.docBase + itr.docID(), 
prefetchFields);
+          }

Review comment:
       If we had a `fetchDocument` method here for this block of code and did 
not provide a `disablePrefetching` option then the `fetchDocument` method would 
provide a hook via which expert users (and test classes) can build their own 
`MyPrefetchingFieldValueFeature extends PrefetchingFieldValueFeature` class 
with prefetching disabling support _and_ they might choose a non-boolean 
implementation e.g. `noPrefetchFeatures=foo,bar` could disable prefetching only 
for the features named `foo` and `bar` or `noPrefetchFeaturePrefix=bla` for 
partial matching against the feature name to cover `bla1` and `blaBla` and 
`blaBlaBla` etc. feature names. Or perhaps prefetching disabling should not be 
by feature name but by field name?
   
   ```
   protected Document fetchDocument() {
     ...
   }
   ```

##########
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/PrefetchingFieldValueFeature.java
##########
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.ltr.feature;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.solr.common.util.StrUtils;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.search.SolrDocumentFetcher;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * This feature returns the value of a field in the current document.
+ * The field must have stored="true" or docValues="true" properties.
+ * Example configuration:
+ * <pre>{
+  "name":  "rawHits",
+  "class": "org.apache.solr.ltr.feature.PrefetchingFieldValueFeature",
+  "params": {
+      "field": "hits"
+  }
+}</pre>
+ */
+public class PrefetchingFieldValueFeature extends FieldValueFeature {
+  // used to store all fields from all PrefetchingFieldValueFeatures
+  private Set<String> prefetchFields;
+  // can be used for debugging to only fetch the field this features uses
+  public static final String DISABLE_PREFETCHING_FIELD_VALUE_FEATURE = 
"disablePrefetchingFieldValueFeature";
+
+  public void setPrefetchFields(Set<String> fields) {
+    prefetchFields = fields;
+  }
+
+  @VisibleForTesting
+  public Set<String> getPrefetchFields(){
+    return prefetchFields;
+  }
+
+  public PrefetchingFieldValueFeature(String name, Map<String,Object> params) {
+    super(name, params);
+  }
+
+  @Override
+  public FeatureWeight createWeight(IndexSearcher searcher, boolean 
needsScores,
+      SolrQueryRequest request, Query originalQuery, Map<String,String[]> efi)
+          throws IOException {
+    return new PrefetchingFieldValueFeatureWeight(searcher, request, 
originalQuery, efi);
+  }
+
+  public class PrefetchingFieldValueFeatureWeight extends 
FieldValueFeatureWeight {
+    private final SolrDocumentFetcher docFetcher;
+    private final Boolean disablePrefetching;
+
+    public PrefetchingFieldValueFeatureWeight(IndexSearcher searcher,
+        SolrQueryRequest request, Query originalQuery, Map<String,String[]> 
efi) {
+      super(searcher, request, originalQuery, efi);
+
+      disablePrefetching = 
request.getParams().getBool(DISABLE_PREFETCHING_FIELD_VALUE_FEATURE, false);

Review comment:
       Hmm, interesting idea, I'd like to go away and ponder it a little. My 
initial thoughts are that yes debugging can be facilitated this way but at a 
cost of (marginal) additional code complexity and from a general user 
perspective it's another flag to understand and use (or not use), and with that 
in mind my intuitive preference would be to not have it, but I'd like to go and 
think about it some more properly.

##########
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/PrefetchingFieldValueFeature.java
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.ltr.feature;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.search.SolrDocumentFetcher;
+import org.apache.solr.search.SolrIndexSearcher;
+
+import java.io.IOException;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * This feature returns the value of a field in the current document.
+ * The field must have stored="true" or docValues="true" properties.
+ * Example configuration:
+ * <pre>{
+  "name":  "rawHits",
+  "class": "org.apache.solr.ltr.feature.FieldValueFeature",
+  "params": {
+      "field": "hits"
+  }
+}</pre>
+ */
+public class PrefetchingFieldValueFeature extends FieldValueFeature {
+  // used to store all fields from all PrefetchingFieldValueFeatures
+  private Set<String> prefetchFields;
+
+  public void setField(String field) {
+    this.field = field;
+  }
+
+  public void setPrefetchFields(Set<String> fields) {
+    prefetchFields = fields;
+  }
+
+  @Override
+  public LinkedHashMap<String,Object> paramsToMap() {
+    final LinkedHashMap<String,Object> params = defaultParamsToMap();
+    params.put("field", field);
+    return params;
+  }
+
+  @Override
+  protected void validate() throws FeatureException {
+    if (field == null || field.isEmpty()) {
+      throw new FeatureException(getClass().getSimpleName()+
+          ": field must be provided");
+    }
+  }
+
+  public PrefetchingFieldValueFeature(String name, Map<String,Object> params) {
+    super(name, params);
+  }
+
+  @Override
+  public FeatureWeight createWeight(IndexSearcher searcher, boolean 
needsScores,
+      SolrQueryRequest request, Query originalQuery, Map<String,String[]> efi)
+          throws IOException {
+    return new PrefetchingFieldValueFeatureWeight(searcher, request, 
originalQuery, efi);
+  }
+
+  @VisibleForTesting
+  public Set<String> getPrefetchFields(){
+    return prefetchFields;
+  }
+
+  public class PrefetchingFieldValueFeatureWeight extends 
FieldValueFeatureWeight {
+    private final SchemaField schemaField;
+    private final SolrDocumentFetcher docFetcher;
+
+    public PrefetchingFieldValueFeatureWeight(IndexSearcher searcher,
+        SolrQueryRequest request, Query originalQuery, Map<String,String[]> 
efi) {
+      super(searcher, request, originalQuery, efi);
+      if (searcher instanceof SolrIndexSearcher) {
+        schemaField = ((SolrIndexSearcher) 
searcher).getSchema().getFieldOrNull(field);
+      } else {
+        schemaField = null;
+      }
+      this.docFetcher = request.getSearcher().getDocFetcher();
+    }
+
+    /**
+     * Return a FeatureScorer that work with stored fields and makes use of 
the cache if the configured field is stored
+     * and has no docValues.
+     * Otherwise, delegate the work to the FieldValueFeature.
+     *
+     * @param context the segment this FeatureScorer is working with
+     * @return FeatureScorer for the current segment and field
+     * @throws IOException as defined by abstract class Feature
+     */
+    @Override
+    public FeatureScorer scorer(LeafReaderContext context) throws IOException {
+      if (schemaField != null && !schemaField.stored() && 
schemaField.hasDocValues()) {
+        return super.scorer(context);
+      }
+      return new PrefetchingFieldValueFeatureScorer(this, context,
+          DocIdSetIterator.all(DocIdSetIterator.NO_MORE_DOCS));
+    }
+
+    /**
+     * A FeatureScorer that reads the stored value for a field
+     * docFetcher does not request a single field but all the prefetchFields 
to improve performance through caching
+     */
+    public class PrefetchingFieldValueFeatureScorer extends 
FieldValueFeatureScorer {
+
+      public PrefetchingFieldValueFeatureScorer(FeatureWeight weight,
+          LeafReaderContext context, DocIdSetIterator itr) {
+        super(weight, context, itr);
+      }
+
+      @Override
+      public float score() throws IOException {
+        try {
+          final Document document = docFetcher.doc(context.docBase + 
itr.docID(), prefetchFields);
+
+          return super.parseStoredFieldValue(document.getField(field));
+        } catch (final IOException e) {
+          throw new FeatureException(
+              e.toString() + ": " +
+                  "Unable to extract feature for "
+                  + name, e);

Review comment:
       > Also with troubleshooting in mind, I wonder if `prefetchFields` being 
not just a set but an ordered set might be useful (_assuming_ that ordering 
would _not_ have a runtime performance cost).
   
   > I assume that your intention behind the ordered set is, that if the user 
wanted to prefetch fields `A, B, C, D` and an exception regarding field `C` 
would happen, they would know that `A` and `B` worked. Did I understand you 
correctly?
   
   I think my thoughts were a bit vague on that actually, thank you for 
encouraging me to organise them a bit more :)
   
   * Depending on how folks name their features and how many features there 
are, if one had to interpret error/exception logging with the details then `fA, 
fC, ... fX, fZ` i.e. alphabetical is (subjectively) more human friendly than 
some random ordering that might arise via the HashSet implementation.
   
   * Given the same inputs in the same order, will two Solr nodes come up with 
the same HashSet ordering for them or not? What if the nodes run different Java 
versions or one node has been running a long time and another is recently 
started? I don't know the answer to those questions but if our two hypethical 
nodes both complained about some features but one complained about `fZ, fA, fX, 
fB` and the other about `fA, fZ, fB, fX` then it might be tricky to interpret 
that they are both complaining about the same thing. Or if there is a 
difference between them then sorted ordering should make it easier to spot what 
the difference is.
   
   * In terms of knowing that a field C exception means that fields A and B 
still worked, no, that's actually not something that had occurred to me. But 
now that you mention it in that way i.e. the potential for one feature or one 
field to affect some but not all of the other features ... I wonder if the 
exception handling could log (say) a WARNing and then do a noPrefetching retry, 
something like this:
   
   ```
     try {
       document = docFetcher.doc(context.docBase + itr.docID(), prefetchFields);
       ...
     } catch {
       ... log warning ...
       try {
         document = docFetcher.doc(context.docBase + itr.docID(), fieldAsSet);
         ... log info to say it worked ...
         ...
       } catch {
         throw exception
       }
     }
   ```
   (And I note that such a fallback-to-no-prefetching mechanism would reduce 
the need for a `disablePrefetching` flag on the request level i.e. the system 
would "self debug" i.e. the unproblematic fields/features would log a warning 
and then an info to indicate that the fallback worked and problematic 
fields/features would log a warning and then presumably would fail again and 
throw an exception after the single-field retry.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[GitHub] [solr] cpoerschke commented on a change in pull request #166: SOLR-15450 add PrefetchingFieldValueFeature

Reply via email to