[GitHub] [solr] cpoerschke commented on a change in pull request #123: [SOLR-12697] Add pure DocValues support to FieldValueFeature

GitBox Fri, 14 May 2021 12:58:56 -0700


cpoerschke commented on a change in pull request #123:
URL: https://github.com/apache/solr/pull/123#discussion_r632759790




##########
File path: solr/contrib/ltr/src/test/org/apache/solr/ltr/TestLTROnSolrCloud.java
##########
@@ -232,58 +249,113 @@ private void createCollection(String name, String 
config, int numShards, int num
     solrCluster.waitForActiveCollection(name, numShards, numShards * 
numReplicas);
   }
 
-
   void indexDocument(String collection, String id, String title, String 
description, int popularity)
-    throws Exception{
+      throws Exception {
     SolrInputDocument doc = new SolrInputDocument();
     doc.setField("id", id);
     doc.setField("title", title);
     doc.setField("description", description);
     doc.setField("popularity", popularity);
+    if (popularity != 1) {
+      // check that empty values will be read as default
+      doc.setField("dvIntField", popularity);
+      doc.setField("dvLongField", popularity);
+      doc.setField("dvFloatField", ((float) popularity) / 10);
+      doc.setField("dvDoubleField", ((double) popularity) / 10);
+      doc.setField("dvStrNumField", popularity);
+      doc.setField("dvStrBoolField", popularity % 2 == 0 ? "T" : "F");
+    }
     solrCluster.getSolrClient().add(collection, doc);
   }
 
-  private void indexDocuments(final String collection)
-       throws Exception {
+  private void indexDocuments(final String collection) throws Exception {
     final int collectionSize = 8;
-    for (int docId = 1; docId <= collectionSize;  docId++) {
+    // put documents in random order to check that advanceExact is working 
correctly
+    List<Integer> docIds = IntStream.rangeClosed(1, 
collectionSize).boxed().collect(toList());

Review comment:
       Nice, I haven't across this `IntStream.rangeClosed` thing before!

##########
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/FieldValueFeature.java
##########
@@ -146,5 +171,100 @@ public float getMaxScore(int upTo) throws IOException {
         return Float.POSITIVE_INFINITY;
       }
     }
+
+    /**
+     * A FeatureScorer that reads the docValues for a field
+     */
+    public class DocValuesFieldValueFeatureScorer extends 
FeatureWeight.FeatureScorer {
+      final LeafReaderContext context;
+      final DocIdSetIterator docValues;

Review comment:
       > ... `advanceExact()` generally seems to be preferred over `advance()`, 
can you tell me why?
   
   Good question, I don't know!
   
   Intuitively something that returns `boolean` rather than `int` when the 
`int` is not needed seems neater but really we are after performance and not 
necessarily neatness.
   
   `NumericDocValues.longValue()` mentioned `advanceExact()` so that could be a 
clue but not an explanation.
   
   Slightly random code browsing to (say) 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
 finds some delegation to 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/lucene/core/src/java/org/apache/lucene/codecs/lucene80/IndexedDISI.java
 and its advanceNext method and mentions of `IndexedDISI` always make me a bit 
_dizzy_ since I don't yet know much about it.
   
   The last commit on 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/lucene/core/src/java/org/apache/lucene/index/NumericDocValues.java
 reads "LUCENE-7462: Give doc values APIs an advanceExact method." and that 
commit and https://issues.apache.org/jira/browse/LUCENE-7462 should have more 
info then.

##########
File path: solr/contrib/ltr/src/test-files/solr/collection1/conf/schema.xml
##########
@@ -18,13 +18,22 @@
 
 <schema name="example" version="1.5">
   <fields>
-    <field name="id" type="string" indexed="true" stored="true" 
required="true" multiValued="false" />
+    <field name="id" type="string" indexed="true" stored="true" 
required="true" multiValued="false"/>
+    <field name="finalScore" type="string" indexed="true" stored="true" 
multiValued="false"/>
+    <field name="finalScoreFloat" type="float" indexed="true" stored="true" 
multiValued="false"/>
+    <field name="field" type="text_general" indexed="true" stored="false" 
multiValued="false"/>
     <field name="title" type="text_general" indexed="true" stored="true"/>
     <field name="description" type="text_general" indexed="true" 
stored="true"/>
     <field name="keywords" type="text_general" indexed="true" stored="true" 
multiValued="true"/>
     <field name="popularity" type="int" indexed="true" stored="true" />

Review comment:
       observation: via the `docValues="${solr.tests.numeric.dv}"` in the 
`<fieldType name="int" ...` and `popularity` being used in the 
`TestFieldValueFeature` we have some existing test coverage for doc values.

##########
File path: solr/contrib/ltr/src/test/org/apache/solr/ltr/TestLTROnSolrCloud.java
##########
@@ -232,58 +249,113 @@ private void createCollection(String name, String 
config, int numShards, int num
     solrCluster.waitForActiveCollection(name, numShards, numShards * 
numReplicas);
   }
 
-
   void indexDocument(String collection, String id, String title, String 
description, int popularity)
-    throws Exception{
+      throws Exception {
     SolrInputDocument doc = new SolrInputDocument();
     doc.setField("id", id);
     doc.setField("title", title);
     doc.setField("description", description);
     doc.setField("popularity", popularity);
+    if (popularity != 1) {
+      // check that empty values will be read as default
+      doc.setField("dvIntField", popularity);
+      doc.setField("dvLongField", popularity);
+      doc.setField("dvFloatField", ((float) popularity) / 10);
+      doc.setField("dvDoubleField", ((double) popularity) / 10);
+      doc.setField("dvStrNumField", popularity);
+      doc.setField("dvStrBoolField", popularity % 2 == 0 ? "T" : "F");
+    }

Review comment:
       I looked at `TestLTROnSolrCloud` changes today, this approach here is 
very elegant, combined with the `resultN_features` and `schema.xml` change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[GitHub] [solr] cpoerschke commented on a change in pull request #123: [SOLR-12697] Add pure DocValues support to FieldValueFeature

Reply via email to