[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-12 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r522516056



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -210,50 +216,59 @@ public void setContext(ResultContext context) {
   }
   
   // Setup LTRScoringQuery
-  scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-  docsWereNotReranked = (scoringQuery == null);
-  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {
-// if store is set in the transformer we should overwrite the logger
-
-final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
-
-final FeatureStore store = fr.getFeatureStore(featureStoreName);
-featureStoreName = store.getName(); // if featureStoreName was null 
before this gets actual name
-
-try {
-  final LoggingModel lm = new LoggingModel(loggingModelName,
-  featureStoreName, store.getFeatures());
-
-  scoringQuery = new LTRScoringQuery(lm,
-  LTRQParserPlugin.extractEFIParams(localparams),
-  true,
-  threadManager); // request feature weights to be created for all 
features
-
-}catch (final Exception e) {
-  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-  "retrieving the feature store "+featureStoreName, e);
-}
-  }
+  rerankingQueries = SolrQueryRequestContextUtils.getScoringQueries(req);
 
-  if (scoringQuery.getOriginalQuery() == null) {
-scoringQuery.setOriginalQuery(context.getQuery());
+  docsWereNotReranked = (rerankingQueries == null || 
rerankingQueries.length == 0);
+  if (docsWereNotReranked) {
+rerankingQueries = new LTRScoringQuery[]{null};
   }
-  if (scoringQuery.getFeatureLogger() == null){
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
-  }
-  scoringQuery.setRequest(req);
-
-  featureLogger = scoringQuery.getFeatureLogger();
+  modelWeights = new LTRScoringQuery.ModelWeight[rerankingQueries.length];
+  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
+  for (int i = 0; i < rerankingQueries.length; i++) {
+LTRScoringQuery scoringQuery = rerankingQueries[i];
+if ((scoringQuery == null || !(scoringQuery instanceof 
OriginalRankingLTRScoringQuery)) && (docsWereNotReranked || (featureStoreName 
!= null && 
!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {

Review comment:
   > ... I believe the code is much more readable now ... now that part is 
extremely clear.
   
   Yes, I agree, very nice.
   
   > ... another consideration sparkled: ... a separate Jira for that ...
   
   Interesting points, will need to think about them a bit (next week). I agree 
it's unrelated i.e. not a blocker for this pull request here.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-12 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r522515743



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+/**
+ * Interleaving was introduced the first time by Joachims in [1, 2].
+ * Team Draft Interleaving is among the most successful and used interleaving 
approaches[3].
+ * Here the authors implement a method similar to the way in which captains 
select their players in team-matches.
+ * Team Draft Interleaving produces a fair distribution of ranking models’ 
elements in the final interleaved list.
+ * It has also proved to overcome an issue of the previous implemented 
approach, Balanced interleaving, in determining the winning model[4].
+ * 
+ * [1] T. Joachims. Optimizing search engines using clickthrough data. KDD 
(2002)
+ * [2] 
T.Joachims.Evaluatingretrievalperformanceusingclickthroughdata.InJ.Franke, G. 
Nakhaeizadeh, and I. Renz, editors,
+ * Text Mining, pages 79–96. Physica/Springer (2003)
+ * [3] F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data 
reflect re-
+ * trieval quality? In CIKM, pages 43–52. ACM Press (2008)
+ * [4] O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue.
+ * Large-scale validation and analysis of interleaved search evaluation. ACM 
TOIS, 30(1):1–41, Feb. (2012)
+ */
+public class TeamDraftInterleaving implements Interleaving{
+  public static Random RANDOM;
+
+  static {
+// We try to make things reproducible in the context of our tests by 
initializing the random instance
+// based on the current seed
+String seed = System.getProperty("tests.seed");
+if (seed == null) {
+  RANDOM = new Random();
+} else {
+  RANDOM = new Random(seed.hashCode());
+}
+  }
+
+  /**
+   * Team Draft Interleaving considers two ranking models: modelA and modelB.
+   * For a given query, each model returns its ranked list of documents La = 
(a1,a2,...) and Lb = (b1, b2, ...).
+   * The algorithm creates a unique ranked list I = (i1, i2, ...).
+   * This list is created by interleaving elements from the two lists la and 
lb as described by Chapelle et al.[1].
+   * Each element Ij is labelled TeamA if it is selected from La and TeamB if 
it is selected from Lb.
+   * 
+   * [1] O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue.
+   * Large-scale validation and analysis of interleaved search evaluation. ACM 
TOIS, 30(1):1–41, Feb. (2012)
+   * 
+   * Assumptions:
+   * - rerankedA and rerankedB has the same length.
+   * They contains the same search results, ranked differently by two ranking 
models
+   * - each reranked list can not contain the same search result more than 
once.
+   *
+   * @param rerankedA a ranked list of search results produced by a ranking 
model A
+   * @param rerankedB a ranked list of search results produced by a ranking 
model B
+   * @return the interleaved ranking list
+   */
+  public InterleavingResult interleave(ScoreDoc[] rerankedA, ScoreDoc[] 
rerankedB) {
+LinkedHashSet interleavedResults = new LinkedHashSet<>();
+ScoreDoc[] interleavedResultArray = new ScoreDoc[rerankedA.length];
+ArrayList> interleavingPicks = new ArrayList<>(2);
+Set teamA = new HashSet<>();
+Set teamB = new HashSet<>();
+int topN = rerankedA.length;
+int indexA = 0, indexB = 0;
+
+while (interleavedResults.size() < topN && indexA < rerankedA.length && 
indexB < rerankedB.length) {
+  if(teamA.size() interleaved, int index, 
ScoreDoc[] reranked) {
+boolean foundElementToAdd = false;
+while (index < reranked.length && !foundElementToAdd) {
+  ScoreDoc elementToCheck = reranked[index];
+  if (interleaved.contains(elementToCheck)) {

Review comment:
   > ... currently Interleaving doesn't support sharding ...
   
   Let's include that in the documentation somehow, e.g. 
https://github.com/cp

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-12 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r522486838



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -208,55 +216,116 @@ public void setContext(ResultContext context) {
   if (threadManager != null) {
 
threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
   }
-  
-  // Setup LTRScoringQuery
-  scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-  docsWereNotReranked = (scoringQuery == null);
-  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {
-// if store is set in the transformer we should overwrite the logger
 
-final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
+  LTRScoringQuery[] rerankingQueriesFromContext = 
SolrQueryRequestContextUtils.getScoringQueries(req);
+  docsWereNotReranked = (rerankingQueriesFromContext == null || 
rerankingQueriesFromContext.length == 0);
+  String transformerFeatureStore = 
SolrQueryRequestContextUtils.getFvStoreName(req);
+  Map transformerExternalFeatureInfo = 
LTRQParserPlugin.extractEFIParams(localparams);
 
-final FeatureStore store = fr.getFeatureStore(featureStoreName);
-featureStoreName = store.getName(); // if featureStoreName was null 
before this gets actual name
-
-try {
-  final LoggingModel lm = new LoggingModel(loggingModelName,
-  featureStoreName, store.getFeatures());
+  initLoggingModel(transformerFeatureStore);

Review comment:
   LTRFeatureLoggerTransformerFactory.2 - `loggingModel` being a member of 
the transformer factory gives it `SolrCore` lifetime/scope but here it's 
initialised based on per-request parameters. If multiple threads use the same 
transformer factory object concurrently then they might trampled upon each 
other. 
https://github.com/cpoerschke/lucene-solr/commit/4912daccd596435f5c61ac1a3cf86eaebb039118
 proposes to not have the logging model as a member of the transformer factory.

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -79,6 +81,7 @@
   private char csvFeatureSeparator = 
CSVFeatureLogger.DEFAULT_FEATURE_SEPARATOR;
 
   private LTRThreadModule threadManager = null;
+  private LoggingModel loggingModel = null;

Review comment:
   LTRFeatureLoggerTransformerFactory.1 - `loggingModel` is a member of of 
the transformer factory here.

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -208,55 +216,116 @@ public void setContext(ResultContext context) {
   if (threadManager != null) {
 
threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
   }

Review comment:
   LTRFeatureLoggerTransformerFactory.3 - I noted that `threadManager` here 
is an existing member of the transformer factory and it is initialised as part 
of request processing. Since there's no locking or anything there could be a 
chance that multiple threads concurrently call `threadManager.setExecutor()` 
but the argument to the set call is not specific to the request i.e. all 
requests would set the same thing (whereas for the logging model different 
requests could supply a different feature store name via the `fl=[feature 
store=...]` parameter).

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -208,55 +216,116 @@ public void setContext(ResultContext context) {
   if (threadManager != null) {
 
threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
   }
-  
-  // Setup LTRScoringQuery
-  scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-  docsWereNotReranked = (scoringQuery == null);
-  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {
-// if store is set in the transformer we should overwrite the logger
 
-final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
+  LTRScoringQuery[] rerankingQueriesFromContext = 
SolrQueryRequestContextUtils.getScoringQueries(req);
+  docsWereNotReranked = (rerankingQueriesFromContext == null || 
rerankingQueriesFromContext.le

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-09 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r519967315



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+/**
+ * Interleaving was introduced the first time by Joachims in [1, 2].
+ * Team Draft Interleaving is among the most successful and used interleaving 
approaches[3].
+ * Here the authors implement a method similar to the way in which captains 
select their players in team-matches.
+ * Team Draft Interleaving produces a fair distribution of ranking models’ 
elements in the final interleaved list.
+ * It has also proved to overcome an issue of the previous implemented 
approach, Balanced interleaving, in determining the winning model[4].

Review comment:
   ```suggestion
* "Team draft interleaving" has also proved to overcome an issue of the 
"Balanced interleaving" approach, in determining the winning model[4].
   ```
   
   Suggest to avoid the "previous implemented approach" wording since it could 
be misinterpreted to mean that Solr previously had a `BalancedInterleaving` 
class.

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+/**
+ * Interleaving was introduced the first time by Joachims in [1, 2].
+ * Team Draft Interleaving is among the most successful and used interleaving 
approaches[3].
+ * Here the authors implement a method similar to the way in which captains 
select their players in team-matches.

Review comment:
   ```suggestion
* Team Draft Interleaving implements a method similar to the way in which 
captains select their players in team-matches.
   ```

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+imp

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-09 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r519944462



##
File path: solr/solr-ref-guide/src/learning-to-rank.adoc
##
@@ -247,6 +254,81 @@ The output XML will include feature values as a 
comma-separated list, resembling
   }}
 
 
+=== Running a Rerank Query Interleaving Two Models
+
+To rerank the results of a query, interleaving two models (myModelA, myModelB) 
add the `rq` parameter to your search, passing two models in input, for example:
+
+[source,text]
+http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA 
model=myModelB reRankDocs=100}&fl=id,score
+
+To obtain the model that interleaving picked for a search result, computed 
during reranking, add `[interleaving]` to the `fl` parameter, for example:

Review comment:
   question: if myModelA had `[ doc1, doc2, doc3 ]` document order and 
myModelB had `[ doc1, doc3, doc2 ]` document order i.e. there was agreement 
between the models re: the first document, will `[interleaving]` return (1) 
randomly `myModelA` or `myModelB` depending on how the picking actually 
happened or will it return (2) something else e.g. `myModelA,myModelB` (if 
myModelA actually picked and myModelB agreed) or `myModelB,myModelA` (if 
myModelB actually picked and myModelA agreed) or will it return (3) neither 
since in a way neither of them picked the document since they both agreed on it?
   
   answer-ish: from recalling the implementation the answer is (1) i think 
though from a user's perspective perhaps it might be nice here to clarify here 
somehow around that? a subtle aspect being (if i understand things right) that 
`[features]` and `[interleaving]` could both be requested in the `fl` and 
whilst myModelA and myModelB might have agreed that `doc1` should be the first 
document they might have used very different features to arrived at that 
conclusion and their `score` value could also differ.

##
File path: solr/solr-ref-guide/src/learning-to-rank.adoc
##
@@ -247,6 +254,81 @@ The output XML will include feature values as a 
comma-separated list, resembling
   }}
 
 
+=== Running a Rerank Query Interleaving Two Models
+
+To rerank the results of a query, interleaving two models (myModelA, myModelB) 
add the `rq` parameter to your search, passing two models in input, for example:
+
+[source,text]
+http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA 
model=myModelB reRankDocs=100}&fl=id,score
+
+To obtain the model that interleaving picked for a search result, computed 
during reranking, add `[interleaving]` to the `fl` parameter, for example:
+
+[source,text]
+http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA 
model=myModelB reRankDocs=100}&fl=id,score,[interleaving]
+
+The output XML will include the model picked for each search result, 
resembling the output shown here:
+
+[source,json]
+
+{
+  "responseHeader":{
+"status":0,
+"QTime":0,
+"params":{
+  "q":"test",
+  "fl":"id,score,[interleaving]",
+  "rq":"{!ltr model=myModelA model=myModelB reRankDocs=100}"}},
+  "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
+  {
+"id":"GB18030TEST",
+"score":1.0005897,
+"[interleaving]":"myModelB"},
+  {
+"id":"UTF8TEST",
+"score":0.79656565,
+"[interleaving]":"myModelA"}]
+  }}
+
+
+=== Running a Rerank Query Interleaving a model with the original ranking
+When approaching Search Quality Evaluation with interleaving it may be useful 
to compare a model with the original ranking. 
+To rerank the results of a query, interleaving a model with the original 
ranking, add the `rq` parameter to your search, with a model in input and 
activating the original ranking interleaving, for example:
+
+
+[source,text]
+http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel 
model=_OriginalRanking_ reRankDocs=100}&fl=id,score

Review comment:
   subjective: might `model=_OriginalRanking_ model=myModel` be more 
intuitive i.e. the 'from' baseline model on the left and the 'to' alternative 
model on the right? (i recall that the code had an "original ranking last" 
assumption before but if that's gone there's a possibility here to swap the 
order)

##
File path: solr/solr-ref-guide/src/learning-to-rank.adoc
##
@@ -418,6 +500,14 @@ Learning-To-Rank is a contrib module and therefore its 
plugins must be configure
 
 
 
+* Declaration of the `[interleaving]` transformer.
++
+[source,xml]
+
+
+

Review comment:
   minor/subjective: could shorten since there's no parameters
   
   ```
   
   ```

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the 

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-06 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518725166



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+public class TeamDraftInterleaving implements Interleaving{
+  public static Random RANDOM;
+
+  static {
+// We try to make things reproducible in the context of our tests by 
initializing the random instance
+// based on the current seed
+String seed = System.getProperty("tests.seed");

Review comment:
   
https://github.com/cpoerschke/lucene-solr/commit/63a190b57fdc892b98c546f005edbc096e4e4095
 is part 2 of 2: it removes the tests.seed System property use in 
TeamDraftInterleaving by passing a Random object into TeamDraftInterleaving. 
instead the tests.seed System property use is in a test class 
(LTRQParserTestPlugin) and tests continue to make their `setRANDOM` calls but 
instead of `TeamDraftInterleaving.setRANDOM` it's now 
`LTRQParserTestPlugin.setRANDOM` then.
   
   Two problems with this though:
   * a forbidden apis check fails (saying that RandomizedRunner's random() 
should be used instead but i'm unclear still on how that might be done)
   * the tests no longer pass, which baffles me though haven't looked at 
details yet; speculations:
 * was the randomness in the tests predictable before and now it's 
predictable also but the number sequence has changed and test expectations need 
to be adjusted to match?
 * something to do with order of system property setting and non-test/test 
class loading perhaps and perhaps use of the RandomizedRunner's random would 
solve it somehow?
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-06 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518659433



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+public class TeamDraftInterleaving implements Interleaving{
+  public static Random RANDOM;
+
+  static {
+// We try to make things reproducible in the context of our tests by 
initializing the random instance
+// based on the current seed
+String seed = System.getProperty("tests.seed");

Review comment:
   Yes, predictable randomness in tests can be tricky! I had an (unplanned) 
idea around this at breakfast this morning, trying it out now: 
https://github.com/cpoerschke/lucene-solr/commit/5fc99898f9bc82f4dabcfe188e132ba2bc727704
 is part 1 of 2 (and is independent technically speaking).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-05 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518364159



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java
##
@@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, 
SolrParams params,
 @Override
 public Query parse() throws SyntaxError {
   // ReRanking Model
-  final String modelName = localParams.get(LTRQParserPlugin.MODEL);
-  if ((modelName == null) || modelName.isEmpty()) {
+  final String[] modelNames = 
localParams.getParams(LTRQParserPlugin.MODEL);
+  if ((modelNames == null) || modelNames.length==0 || 
modelNames[0].isEmpty()) {
 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
 "Must provide model in the request");
   }
-
-  final LTRScoringModel ltrScoringModel = mr.getModel(modelName);
-  if (ltrScoringModel == null) {
-throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-"cannot find " + LTRQParserPlugin.MODEL + " " + modelName);
-  }
-
-  final String modelFeatureStoreName = 
ltrScoringModel.getFeatureStoreName();
-  final boolean extractFeatures = 
SolrQueryRequestContextUtils.isExtractingFeatures(req);
-  final String fvStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  // Check if features are requested and if the model feature store and 
feature-transform feature store are the same
-  final boolean featuresRequestedFromSameStore = 
(modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? 
extractFeatures:false;
-  if (threadManager != null) {
-
threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
-  }
-  final LTRScoringQuery scoringQuery = new LTRScoringQuery(ltrScoringModel,
-  extractEFIParams(localParams),
-  featuresRequestedFromSameStore, threadManager);
-
-  // Enable the feature vector caching if we are extracting features, and 
the features
-  // we requested are the same ones we are reranking with
-  if (featuresRequestedFromSameStore) {
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
+ 
+  LTRScoringQuery[] rerankingQueries = new 
LTRScoringQuery[modelNames.length];
+  for (int i = 0; i < modelNames.length; i++) {
+final LTRScoringQuery rerankingQuery;
+if (!ORIGINAL_RANKING.equals(modelNames[i])) {
+  final LTRScoringModel ltrScoringModel = mr.getModel(modelNames[i]);
+  if (ltrScoringModel == null) {
+throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
+"cannot find " + LTRQParserPlugin.MODEL + " " + modelNames[i]);
+  }
+  final String modelFeatureStoreName = 
ltrScoringModel.getFeatureStoreName();
+  final boolean extractFeatures = 
SolrQueryRequestContextUtils.isExtractingFeatures(req);
+  final String fvStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);// Check if features 
are requested and if the model feature store and feature-transform feature 
store are the same
+  final boolean featuresRequestedFromSameStore = 
(modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? 
extractFeatures : false;
+  if (threadManager != null) {
+
threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
+  }
+  rerankingQuery = new LTRScoringQuery(ltrScoringModel,
+  extractEFIParams(localParams),
+  featuresRequestedFromSameStore, threadManager);
+
+  // Enable the feature vector caching if we are extracting features, 
and the features
+  // we requested are the same ones we are reranking with
+  if (featuresRequestedFromSameStore) {
+rerankingQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
+  }
+}else{
+  rerankingQuery = new LTRScoringQuery(null);
+}
+
+// External features
+rerankingQuery.setRequest(req);
+rerankingQueries[i] = rerankingQuery;
   }
-  SolrQueryRequestContextUtils.setScoringQuery(req, scoringQuery);
 
+  SolrQueryRequestContextUtils.setScoringQuery(req, rerankingQueries);
   int reRankDocs = localParams.getInt(RERANK_DOCS, DEFAULT_RERANK_DOCS);
   if (reRankDocs <= 0) {
 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-  "Must rerank at least 1 document");
+"Must rerank at least 1 document");
+  }
+  if (rerankingQueries.length == 1) {
+return new LTRQuery(rerankingQueries[0], reRankDocs);
+  } else {
+return new LTRQuery(rerankingQueries, reRankDocs);
   }
-
-  // External features
-  scoringQuery.setRequest(req);
-
-  return new LTRQuery

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-05 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518217755



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -210,50 +216,59 @@ public void setContext(ResultContext context) {
   }
   
   // Setup LTRScoringQuery
-  scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-  docsWereNotReranked = (scoringQuery == null);
-  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {
-// if store is set in the transformer we should overwrite the logger
-
-final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
-
-final FeatureStore store = fr.getFeatureStore(featureStoreName);
-featureStoreName = store.getName(); // if featureStoreName was null 
before this gets actual name
-
-try {
-  final LoggingModel lm = new LoggingModel(loggingModelName,
-  featureStoreName, store.getFeatures());
-
-  scoringQuery = new LTRScoringQuery(lm,
-  LTRQParserPlugin.extractEFIParams(localparams),
-  true,
-  threadManager); // request feature weights to be created for all 
features
-
-}catch (final Exception e) {
-  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-  "retrieving the feature store "+featureStoreName, e);
-}
-  }
+  rerankingQueries = SolrQueryRequestContextUtils.getScoringQueries(req);
 
-  if (scoringQuery.getOriginalQuery() == null) {
-scoringQuery.setOriginalQuery(context.getQuery());
+  docsWereNotReranked = (rerankingQueries == null || 
rerankingQueries.length == 0);
+  if (docsWereNotReranked) {
+rerankingQueries = new LTRScoringQuery[]{null};
   }
-  if (scoringQuery.getFeatureLogger() == null){
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
-  }
-  scoringQuery.setRequest(req);
-
-  featureLogger = scoringQuery.getFeatureLogger();
+  modelWeights = new LTRScoringQuery.ModelWeight[rerankingQueries.length];
+  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
+  for (int i = 0; i < rerankingQueries.length; i++) {
+LTRScoringQuery scoringQuery = rerankingQueries[i];
+if ((scoringQuery == null || !(scoringQuery instanceof 
OriginalRankingLTRScoringQuery)) && (docsWereNotReranked || (featureStoreName 
!= null && 
!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {

Review comment:
   12/n observations/thoughts/questions:
   
   Most tricky to articulate, hence left until last.
   
   Prior to interleaving the existing logic is that if feature vectors are 
requested and there is no model (or the model is for a different feature store) 
then a logging model is created.
   
   So now then if we have two models:
   * if both models are for the requested feature store then that's great and 
each document would have been picked by one of the models and so we use the 
feature vector already previously calculated by whatever model had picked the 
document.
   * if neither model is for the requested feature store then we need to create 
a logging model, is one logging model sufficient or do we need two? intuitively 
to me one would seem to be sufficient but that's based on partial analysis only 
so far.
   * if one of the two models (modelA) is for the requested feature store then 
for the documents picked by modelA we can use the feature vector already 
previously calculated by modelA. what about documents picked by modelB? it 
could be that modelA actually has the feature vector for that document but that 
modelB simply managed to pick the document first. or if modelA does not have 
the feature vector then we could calculate it for modelA. would a logging model 
help in this scenario? intuitively to me it would seem that calculating the 
missing feature vector via modelA or via the logging model would both be 
equally efficient and hence no logging model would be needed but again that's 
only based on partial analysis so far.

##
File path: solr/solr-ref-guide/src/learning-to-rank.adoc
##
@@ -247,6 +254,81 @@ The output XML will include feature values as a 
comma-separated list, resembling
   }}
 
 
+=== Running a Rerank Query Interleaving Two Models
+
+To rerank the results of a query, interleaving two models (myModelA, myModelB) 
add the `rq` parameter to your search, passing two models in input, for example:
+
+[source,text]
+http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA 
model=myModelB reRankDocs=100}&fl=id,score
+
+To obt

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-05 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518156348



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRInterleavingRescorer.java
##
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.ltr;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.search.Explanation;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.ScoreDoc;
+import org.apache.lucene.search.ScoreMode;
+import org.apache.lucene.search.TopDocs;
+import org.apache.solr.ltr.interleaving.Interleaving;
+import org.apache.solr.ltr.interleaving.InterleavingResult;
+import org.apache.solr.ltr.interleaving.TeamDraftInterleaving;
+
+/**
+ * Implements the rescoring logic. The top documents returned by solr with 
their
+ * original scores, will be processed by a {@link LTRScoringQuery} that will 
assign a
+ * new score to each document. The top documents will be resorted based on the
+ * new score.
+ * */
+public class LTRInterleavingRescorer extends LTRRescorer {
+  
+  LTRScoringQuery[] rerankingQueries;
+  Interleaving interleavingAlgorithm = new TeamDraftInterleaving();

Review comment:
   11.1/n If we go with 4/n then `LTRQParserPlugin.LTRQParser` could pass a 
`Interleaving interleavingAlgorithm` argument to the `LTRInterleavingQuery` 
constructor which could pass it to the `LTRInterleavingRescorer` constructor. 
For now `TeamDraftInterleaving` would be the only supported algorithm but in 
future other algorithms could then easily be added e.g. based on an additional 
`ltr` parameter. What do you think? An easy change to make now or something 
better left for later?

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+public class TeamDraftInterleaving implements Interleaving{
+  public static Random RANDOM;
+
+  static {
+// We try to make things reproducible in the context of our tests by 
initializing the random instance
+// based on the current seed
+String seed = System.getProperty("tests.seed");

Review comment:
   11.2/n `LTRQParserPlugin.LTRQParser` also has access to the 
`SolrQueryRequest` and its `SolrCore` object. For some reason I thought that 
within that some 'official' source of random-ness might be available which 
could be passed to a `TeamDraftInterleaving(Random)` constructor. And I 
imagined that our test harnesses would use seeds to make tests reproducible 
w.r.t. that 'official' source of random-ness. There however doesn't appear to 
be such a source of non-test official random-ness? 
`System.getProperty("tests.seed");` being used/available to non-test code seems 
potentially tricky.
   
   @dweiss would you perhaps have any insights around non-test sources of 
randomness?
   





This is an automated message from the Apache Git Service.
To respond to the message, p

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-05 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518035475



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRInterleavingRescorer.java
##
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.ltr;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.search.Explanation;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.ScoreDoc;
+import org.apache.lucene.search.ScoreMode;
+import org.apache.lucene.search.TopDocs;
+import org.apache.solr.ltr.interleaving.Interleaving;
+import org.apache.solr.ltr.interleaving.InterleavingResult;
+import org.apache.solr.ltr.interleaving.TeamDraftInterleaving;
+
+import static org.apache.solr.ltr.search.LTRQParserPlugin.isOriginalRanking;
+
+/**
+ * Implements the rescoring logic. The top documents returned by solr with 
their
+ * original scores, will be processed by a {@link LTRScoringQuery} that will 
assign a
+ * new score to each document. The top documents will be resorted based on the
+ * new score.
+ * */
+public class LTRInterleavingRescorer extends LTRRescorer {
+  
+  LTRScoringQuery[] rerankingQueries;
+  Interleaving interleavingAlgorithm = new TeamDraftInterleaving();
+  
+  public LTRInterleavingRescorer(LTRScoringQuery[] rerankingQueries) {
+this.rerankingQueries = rerankingQueries;
+  }
+
+  /**
+   * rescores the documents:
+   *
+   * @param searcher
+   *  current IndexSearcher
+   * @param firstPassTopDocs
+   *  documents to rerank;
+   * @param topN
+   *  documents to return;
+   */
+  @Override
+  public TopDocs rescore(IndexSearcher searcher, TopDocs firstPassTopDocs,
+  int topN) throws IOException {
+if ((topN == 0) || (firstPassTopDocs.scoreDocs.length == 0)) {
+  return firstPassTopDocs;
+}
+
+int originalRankingIndex = -1;

Review comment:
   10/n suggestions:
   * calculate originalRankingIndex in the constructor
   * use originalRankingIndex to minimise "rerankingQueries[i] is not original 
ranking" checks
   * use originalRankingIndex to remove the "original ranking query is always 
the last query" assumption
   * the 'rerank' method already takes a `searcher` and so it could determine 
its own `leaves` from that
   
   
https://github.com/cpoerschke/lucene-solr/commit/002c31cceaeff411ad35111f00b7cee71c5b11de





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-05 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r517960447



##
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRRescorer.java
##
@@ -166,64 +186,77 @@ public void scoreFeatures(IndexSearcher indexSearcher, 
TopDocs firstPassTopDocs,
 docBase = readerContext.docBase;
 scorer = modelWeight.scorer(readerContext);
   }
-  // Scorer for a LTRScoringQuery.ModelWeight should never be null since 
we always have to
-  // call score
-  // even if no feature scorers match, since a model might use that info to
-  // return a
-  // non-zero score. Same applies for the case of advancing a 
LTRScoringQuery.ModelWeight.ModelScorer
-  // past the target
-  // doc since the model algorithm still needs to compute a potentially
-  // non-zero score from blank features.
-  assert (scorer != null);
-  final int targetDoc = docID - docBase;
-  scorer.docID();
-  scorer.iterator().advance(targetDoc);
-
-  scorer.getDocInfo().setOriginalDocScore(hit.score);
-  hit.score = scorer.score();
-  if (hitUpto < topN) {
-reranked[hitUpto] = hit;
-// if the heap is not full, maybe I want to log the features for this
-// document
+  scoreSingleHit(indexSearcher, topN, modelWeight, docBase, hitUpto, hit, 
docID, scoringQuery, scorer, reranked);

Review comment:
   9/n observation and suggestions:
   * very elegant factoring out of methods for use by `LTRInterleavingRescorer`
   * the `rerank` method already takes a `searcher` and so it could determine 
its own `leaves` from that
   * `this.scoringQuery` being passed to the `scoreSingleHit` method as an 
argument (rather than it using the `this.scoringQuery` directly) is very 
subtle. it is of course present as an argument because 
`LTRInterleavingRescorer` will passing `rerankingQueries[i]` for that argument. 
the subtlety could be removed by making `scoreSingleHit` a static method.
   
   
https://github.com/cpoerschke/lucene-solr/commit/16512dbddbabe12ca5a2a26a9180ceeaae62cea2





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-05 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r517889175



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java
##
@@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, 
SolrParams params,
 @Override
 public Query parse() throws SyntaxError {

Review comment:
   7/n minor comments:
   * technically not just `modelNames[0]` but all the model names could be 
`isEmpty` checked
   * `threadManager.setExecutor` (already mentioned in 6/n) need not be inside 
the loop, likewise `extractFeatures` and `fvStoreName` and 
`extractEFIParams(localParams)` could move out.
   

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+public class TeamDraftInterleaving implements Interleaving{

Review comment:
   8/n suggestions:
   * class level javadocs re: the interleaving algorithm
   * comments or javadocs re: any assumptions e.g.
 * must rerankedA.length and rerankedB.length match?
 * can rerankedA and rerankedB contain the same docs?
 * can rerankedA contain the same doc more than once?
 * can rerankedB contain the same doc more than once?
   * consider guarding against array-index-out-of-bounds exceptions (even if 
they shouldn't happen if all assumptions are met)
   
   ```
   indexA = updateIndex(interleavedResults,indexA,rerankedA);
   if (indexA < rerankedA.length) {
 interleavedResults.add(rerankedA[indexA]);
 teamA.add(rerankedA[indexA].doc);
 indexA++;
   }
   ```
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-04 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r517514900



##
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRScoringQuery.java
##
@@ -73,6 +74,8 @@
   final private Map efi;
   // Original solr query used to fetch matching documents
   private Query originalQuery;
+  // Model was picked for this Docs
+  private Set pickedInterleavingDocIds;

Review comment:
   4.1/n The addition of a new member here which is only sometimes 
applicable got me wondering about possibly having `LTRInterleavingScoringQuery 
extends LTRScoringQuery` inheritance.

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java
##
@@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, 
SolrParams params,
 @Override
 public Query parse() throws SyntaxError {
   // ReRanking Model
-  final String modelName = localParams.get(LTRQParserPlugin.MODEL);
-  if ((modelName == null) || modelName.isEmpty()) {
+  final String[] modelNames = 
localParams.getParams(LTRQParserPlugin.MODEL);
+  if ((modelNames == null) || modelNames.length==0 || 
modelNames[0].isEmpty()) {
 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
 "Must provide model in the request");
   }
-
-  final LTRScoringModel ltrScoringModel = mr.getModel(modelName);
-  if (ltrScoringModel == null) {
-throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-"cannot find " + LTRQParserPlugin.MODEL + " " + modelName);
-  }
-
-  final String modelFeatureStoreName = 
ltrScoringModel.getFeatureStoreName();
-  final boolean extractFeatures = 
SolrQueryRequestContextUtils.isExtractingFeatures(req);
-  final String fvStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  // Check if features are requested and if the model feature store and 
feature-transform feature store are the same
-  final boolean featuresRequestedFromSameStore = 
(modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? 
extractFeatures:false;
-  if (threadManager != null) {
-
threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
-  }
-  final LTRScoringQuery scoringQuery = new LTRScoringQuery(ltrScoringModel,
-  extractEFIParams(localParams),
-  featuresRequestedFromSameStore, threadManager);
-
-  // Enable the feature vector caching if we are extracting features, and 
the features
-  // we requested are the same ones we are reranking with
-  if (featuresRequestedFromSameStore) {
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
+ 
+  LTRScoringQuery[] rerankingQueries = new 
LTRScoringQuery[modelNames.length];
+  for (int i = 0; i < modelNames.length; i++) {
+final LTRScoringQuery rerankingQuery;
+if (!ORIGINAL_RANKING.equals(modelNames[i])) {
+  final LTRScoringModel ltrScoringModel = mr.getModel(modelNames[i]);
+  if (ltrScoringModel == null) {
+throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
+"cannot find " + LTRQParserPlugin.MODEL + " " + modelNames[i]);
+  }
+  final String modelFeatureStoreName = 
ltrScoringModel.getFeatureStoreName();
+  final boolean extractFeatures = 
SolrQueryRequestContextUtils.isExtractingFeatures(req);
+  final String fvStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);// Check if features 
are requested and if the model feature store and feature-transform feature 
store are the same
+  final boolean featuresRequestedFromSameStore = 
(modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? 
extractFeatures : false;
+  if (threadManager != null) {
+
threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
+  }
+  rerankingQuery = new LTRScoringQuery(ltrScoringModel,
+  extractEFIParams(localParams),
+  featuresRequestedFromSameStore, threadManager);
+
+  // Enable the feature vector caching if we are extracting features, 
and the features
+  // we requested are the same ones we are reranking with
+  if (featuresRequestedFromSameStore) {
+rerankingQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
+  }
+}else{
+  rerankingQuery = new LTRScoringQuery(null);
+}
+
+// External features
+rerankingQuery.setRequest(req);
+rerankingQueries[i] = rerankingQuery;
   }
-  SolrQueryRequestContextUtils.setScoringQuery(req, scoringQuery);
 
+  SolrQueryRequestContextUtils.setScoringQuery(req, rerankingQueries);
   int reRankDocs = localParams.getInt(RERAN

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-04 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r517502678



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java
##
@@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, 
SolrParams params,
 @Override
 public Query parse() throws SyntaxError {
   // ReRanking Model
-  final String modelName = localParams.get(LTRQParserPlugin.MODEL);
-  if ((modelName == null) || modelName.isEmpty()) {
+  final String[] modelNames = 
localParams.getParams(LTRQParserPlugin.MODEL);
+  if ((modelNames == null) || modelNames.length==0 || 
modelNames[0].isEmpty()) {
 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
 "Must provide model in the request");
   }
-
-  final LTRScoringModel ltrScoringModel = mr.getModel(modelName);
-  if (ltrScoringModel == null) {
-throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-"cannot find " + LTRQParserPlugin.MODEL + " " + modelName);
-  }
-
-  final String modelFeatureStoreName = 
ltrScoringModel.getFeatureStoreName();
-  final boolean extractFeatures = 
SolrQueryRequestContextUtils.isExtractingFeatures(req);
-  final String fvStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  // Check if features are requested and if the model feature store and 
feature-transform feature store are the same
-  final boolean featuresRequestedFromSameStore = 
(modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? 
extractFeatures:false;
-  if (threadManager != null) {
-
threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
-  }
-  final LTRScoringQuery scoringQuery = new LTRScoringQuery(ltrScoringModel,
-  extractEFIParams(localParams),
-  featuresRequestedFromSameStore, threadManager);
-
-  // Enable the feature vector caching if we are extracting features, and 
the features
-  // we requested are the same ones we are reranking with
-  if (featuresRequestedFromSameStore) {
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
+ 
+  LTRScoringQuery[] rerankingQueries = new 
LTRScoringQuery[modelNames.length];
+  for (int i = 0; i < modelNames.length; i++) {
+final LTRScoringQuery rerankingQuery;
+if (!ORIGINAL_RANKING.equals(modelNames[i])) {

Review comment:
   Hi @alessandrobenedetti! Apologies for not returning to here sooner.
   
   Good point about Apache Solr already using "special names" in various 
places, sure, let's go with "_OriginalRanking_" as the special name then. Hmm, 
the UI treats underscore special here, so `"_OriginalRanking_"` is what's in 
the code currently.
   
   > ... we could move the review to the re-scoring part, that is more delicate 
...
   
   "delicate" is an excellent word choice, thank you, I spent some pleasant 
time today continuing to learn more about the re-scoring part. Took a sort of 
outside-to-inside approach i.e. starting at `LTRQParserPlugin` and then looking 
at what changed and how the bits fit together.
   
   If it's okay with you I'll push a branch to my fork and add 
comments/questions sometimes with commit links here, not quite sure if that 
might work, let's try?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-10-02 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r498948018



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java
##
@@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, 
SolrParams params,
 @Override
 public Query parse() throws SyntaxError {
   // ReRanking Model
-  final String modelName = localParams.get(LTRQParserPlugin.MODEL);
-  if ((modelName == null) || modelName.isEmpty()) {
+  final String[] modelNames = 
localParams.getParams(LTRQParserPlugin.MODEL);
+  if ((modelNames == null) || modelNames.length==0 || 
modelNames[0].isEmpty()) {
 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
 "Must provide model in the request");
   }
-
-  final LTRScoringModel ltrScoringModel = mr.getModel(modelName);
-  if (ltrScoringModel == null) {
-throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-"cannot find " + LTRQParserPlugin.MODEL + " " + modelName);
-  }
-
-  final String modelFeatureStoreName = 
ltrScoringModel.getFeatureStoreName();
-  final boolean extractFeatures = 
SolrQueryRequestContextUtils.isExtractingFeatures(req);
-  final String fvStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  // Check if features are requested and if the model feature store and 
feature-transform feature store are the same
-  final boolean featuresRequestedFromSameStore = 
(modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? 
extractFeatures:false;
-  if (threadManager != null) {
-
threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
-  }
-  final LTRScoringQuery scoringQuery = new LTRScoringQuery(ltrScoringModel,
-  extractEFIParams(localParams),
-  featuresRequestedFromSameStore, threadManager);
-
-  // Enable the feature vector caching if we are extracting features, and 
the features
-  // we requested are the same ones we are reranking with
-  if (featuresRequestedFromSameStore) {
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
+ 
+  LTRScoringQuery[] rerankingQueries = new 
LTRScoringQuery[modelNames.length];
+  for (int i = 0; i < modelNames.length; i++) {
+final LTRScoringQuery rerankingQuery;
+if (!ORIGINAL_RANKING.equals(modelNames[i])) {

Review comment:
   Ah, good point about the special "OriginalRanking" also appearing in the 
"[interleaving]" transformer!
   
   When using interleaving there's always at least two models to be 
interleaved, right?
   
   The models could all be actual models or one of them could be the 
"OriginalRanking" pseudo-model.
   
   I wonder if class inheritance might help us e.g.
   
   ```
   class InterleavingLTRQParserPlugin extends LTRQParserPlugin
   ```
   
   and (say)
   
   ```
   
   
   ```
   
   where
   
   ```
   rq={!iltr model=myModelXYZ}
   ```
   
   can be a convenience equivalent to
   
   ```
   rq={!iltr model=OriginalRanking model=myModelXYZ}
   ```
   
   i.e. if only one model is supplied then it is implied that the second model 
is the original ranking.
   
   And if the special "OriginalRanking" name doesn't suit someone (either 
because they already have a real model that happens to be called 
"OriginalRanking" or because they would prefer a different descriptor in the 
"[interleaving]" transformer output) then something like
   
   ```
   rq={!iltr originalRankingModel=noModel model=noModel model=someModel}
   ```
   
   would allow them to call the "OriginalRanking" something else e.g. "noModel" 
instead.
   
   
   We could even reject any "OriginalRanking" models that are actual models via 
something like
   
   ```
   final String originalRankingModelName = 
localParams.get("originalRankingModel", "OriginalRanking" /* default */);
   if (null != mr.getModel(originalRankingModelName)) {
 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
 "Found an actual '" + originalRankingModelName + 
"' model, please ...
   }
   ```
   
   
   
   However, the "[interleaving]" transformer still needs to know about the 
special name, hmm.
   
   
   At the moment we have
   
   ```
   public static boolean isOriginalRanking(LTRScoringQuery rerankingQuery){
 return rerankingQuery.getScoringModel() == null;
   }
   ```
   
   in LTRQParserPlugin and in LTRInterleavingTransformerFactory
   
   ```
   if (isOriginalRanking(rerankingQuery)) {
 doc.addField(name, ORIGINAL_RANKING);
   } else {
 doc.addField(name, rerankingQuery.getScoringModel().getName());
   }
   ```
   
   and if we gave LTRScoringQuery a getScoringModelName method
   
   ```
   public String getScoringModelName() {
 return ltrScoringModel.getName();
   }
   ```

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-07-29 Thread GitBox


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r462456864



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java
##
@@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, 
SolrParams params,
 @Override
 public Query parse() throws SyntaxError {
   // ReRanking Model
-  final String modelName = localParams.get(LTRQParserPlugin.MODEL);
-  if ((modelName == null) || modelName.isEmpty()) {
+  final String[] modelNames = 
localParams.getParams(LTRQParserPlugin.MODEL);
+  if ((modelNames == null) || modelNames.length==0 || 
modelNames[0].isEmpty()) {
 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
 "Must provide model in the request");
   }
-
-  final LTRScoringModel ltrScoringModel = mr.getModel(modelName);
-  if (ltrScoringModel == null) {
-throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-"cannot find " + LTRQParserPlugin.MODEL + " " + modelName);
-  }
-
-  final String modelFeatureStoreName = 
ltrScoringModel.getFeatureStoreName();
-  final boolean extractFeatures = 
SolrQueryRequestContextUtils.isExtractingFeatures(req);
-  final String fvStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  // Check if features are requested and if the model feature store and 
feature-transform feature store are the same
-  final boolean featuresRequestedFromSameStore = 
(modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? 
extractFeatures:false;
-  if (threadManager != null) {
-
threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
-  }
-  final LTRScoringQuery scoringQuery = new LTRScoringQuery(ltrScoringModel,
-  extractEFIParams(localParams),
-  featuresRequestedFromSameStore, threadManager);
-
-  // Enable the feature vector caching if we are extracting features, and 
the features
-  // we requested are the same ones we are reranking with
-  if (featuresRequestedFromSameStore) {
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
+ 
+  LTRScoringQuery[] rerankingQueries = new 
LTRScoringQuery[modelNames.length];
+  for (int i = 0; i < modelNames.length; i++) {
+final LTRScoringQuery rerankingQuery;
+if (!ORIGINAL_RANKING.equals(modelNames[i])) {

Review comment:
   The model name `originalRanking` is being given a special meaning here. 
I wonder if perhaps the differences between models could be transferred to the 
parameter names somehow (e.g. a new `original_model` parameter name alongside 
the existing `model` parameter name)? Then users could choose any model name 
they like, including for "original ranking" purposes.
   
   https://github.com/apache/lucene-solr/pull/1705 proposes to factor out a 
`LTRQParserPlugin.newLTRScoringQuery` method (but I haven't yet explored fully 
w.r.t. how that might connect up here w.r.t. additional parameter names).
   
   What do you think?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org