cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r519967315



##########
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+/**
+ * Interleaving was introduced the first time by Joachims in [1, 2].
+ * Team Draft Interleaving is among the most successful and used interleaving 
approaches[3].
+ * Here the authors implement a method similar to the way in which captains 
select their players in team-matches.
+ * Team Draft Interleaving produces a fair distribution of ranking models’ 
elements in the final interleaved list.
+ * It has also proved to overcome an issue of the previous implemented 
approach, Balanced interleaving, in determining the winning model[4].

Review comment:
       ```suggestion
    * "Team draft interleaving" has also proved to overcome an issue of the 
"Balanced interleaving" approach, in determining the winning model[4].
   ```
   
   Suggest to avoid the "previous implemented approach" wording since it could 
be misinterpreted to mean that Solr previously had a `BalancedInterleaving` 
class.

##########
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+/**
+ * Interleaving was introduced the first time by Joachims in [1, 2].
+ * Team Draft Interleaving is among the most successful and used interleaving 
approaches[3].
+ * Here the authors implement a method similar to the way in which captains 
select their players in team-matches.

Review comment:
       ```suggestion
    * Team Draft Interleaving implements a method similar to the way in which 
captains select their players in team-matches.
   ```

##########
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##########
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+public class TeamDraftInterleaving implements Interleaving{

Review comment:
       Very nice java docs and comments, thanks!
   
   > ... I think we should raise the exception ...
   
   Yes, you're right, an exception that will lead to a bug being identified and 
fixed rather than silent failure or (confusing for the user) "array out of 
bounds exception" would be better.

##########
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+/**
+ * Interleaving was introduced the first time by Joachims in [1, 2].
+ * Team Draft Interleaving is among the most successful and used interleaving 
approaches[3].
+ * Here the authors implement a method similar to the way in which captains 
select their players in team-matches.
+ * Team Draft Interleaving produces a fair distribution of ranking models’ 
elements in the final interleaved list.
+ * It has also proved to overcome an issue of the previous implemented 
approach, Balanced interleaving, in determining the winning model[4].
+ * <p>
+ * [1] T. Joachims. Optimizing search engines using clickthrough data. KDD 
(2002)
+ * [2] 
T.Joachims.Evaluatingretrievalperformanceusingclickthroughdata.InJ.Franke, G. 
Nakhaeizadeh, and I. Renz, editors,
+ * Text Mining, pages 79–96. Physica/Springer (2003)
+ * [3] F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data 
reflect re-
+ * trieval quality? In CIKM, pages 43–52. ACM Press (2008)
+ * [4] O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue.
+ * Large-scale validation and analysis of interleaved search evaluation. ACM 
TOIS, 30(1):1–41, Feb. (2012)
+ */
+public class TeamDraftInterleaving implements Interleaving{
+  public static Random RANDOM;
+
+  static {
+    // We try to make things reproducible in the context of our tests by 
initializing the random instance
+    // based on the current seed
+    String seed = System.getProperty("tests.seed");
+    if (seed == null) {
+      RANDOM = new Random();
+    } else {
+      RANDOM = new Random(seed.hashCode());
+    }
+  }
+
+  /**
+   * Team Draft Interleaving considers two ranking models: modelA and modelB.
+   * For a given query, each model returns its ranked list of documents La = 
(a1,a2,...) and Lb = (b1, b2, ...).
+   * The algorithm creates a unique ranked list I = (i1, i2, ...).
+   * This list is created by interleaving elements from the two lists la and 
lb as described by Chapelle et al.[1].
+   * Each element Ij is labelled TeamA if it is selected from La and TeamB if 
it is selected from Lb.
+   * <p>
+   * [1] O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue.
+   * Large-scale validation and analysis of interleaved search evaluation. ACM 
TOIS, 30(1):1–41, Feb. (2012)
+   * <p>
+   * Assumptions:
+   * - rerankedA and rerankedB has the same length.
+   * They contains the same search results, ranked differently by two ranking 
models
+   * - each reranked list can not contain the same search result more than 
once.
+   *
+   * @param rerankedA a ranked list of search results produced by a ranking 
model A
+   * @param rerankedB a ranked list of search results produced by a ranking 
model B
+   * @return the interleaved ranking list
+   */
+  public InterleavingResult interleave(ScoreDoc[] rerankedA, ScoreDoc[] 
rerankedB) {
+    LinkedHashSet<ScoreDoc> interleavedResults = new LinkedHashSet<>();
+    ScoreDoc[] interleavedResultArray = new ScoreDoc[rerankedA.length];
+    ArrayList<Set<Integer>> interleavingPicks = new ArrayList<>(2);
+    Set<Integer> teamA = new HashSet<>();
+    Set<Integer> teamB = new HashSet<>();
+    int topN = rerankedA.length;
+    int indexA = 0, indexB = 0;
+
+    while (interleavedResults.size() < topN && indexA < rerankedA.length && 
indexB < rerankedB.length) {
+      if(teamA.size()<teamB.size() || (teamA.size()==teamB.size() && 
!RANDOM.nextBoolean())){
+        indexA = updateIndex(interleavedResults,indexA,rerankedA);
+        interleavedResults.add(rerankedA[indexA]);
+        teamA.add(rerankedA[indexA].doc);
+        indexA++;
+      } else{
+        indexB = updateIndex(interleavedResults,indexB,rerankedB);
+        interleavedResults.add(rerankedB[indexB]);
+        teamB.add(rerankedB[indexB].doc);
+        indexB++;
+      }
+    }
+    interleavingPicks.add(teamA);
+    interleavingPicks.add(teamB);
+    interleavedResultArray = 
interleavedResults.toArray(interleavedResultArray);
+
+    return new InterleavingResult(interleavedResultArray,interleavingPicks);
+  }
+
+  private int updateIndex(LinkedHashSet<ScoreDoc> interleaved, int index, 
ScoreDoc[] reranked) {
+    boolean foundElementToAdd = false;
+    while (index < reranked.length && !foundElementToAdd) {
+      ScoreDoc elementToCheck = reranked[index];
+      if (interleaved.contains(elementToCheck)) {

Review comment:
       Wondering re: `interleaved.contains(elementToCheck)` vs. 
`interleaved.contains(elementToCheck).doc` here. I note that the new 
`ScoreDoc.equals` method considers `doc` and `shardIndex` but not `score` which 
makes sense in the context here but for `ScoreDoc` alone is perhaps less 
obvious. The `interleavingPicks` use `doc` only (i.e. not `doc+shardIndex`). I 
haven't fully thought it through but intuitively if `doc+shardIndex` is 
required here then would it not also be required for the interleaving picks to 
avoid a strange edge case when potentially modelA picks "doc 1 from shard 1" 
and modelB picks "doc 1 from shard 2"?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to