[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r522516056 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ## @@ -210,50 +216,59 @@ public void setContext(ResultContext context) { } // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { -// if store is set in the transformer we should overwrite the logger - -final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); - -final FeatureStore store = fr.getFeatureStore(featureStoreName); -featureStoreName = store.getName(); // if featureStoreName was null before this gets actual name - -try { - final LoggingModel lm = new LoggingModel(loggingModelName, - featureStoreName, store.getFeatures()); - - scoringQuery = new LTRScoringQuery(lm, - LTRQParserPlugin.extractEFIParams(localparams), - true, - threadManager); // request feature weights to be created for all features - -}catch (final Exception e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, - "retrieving the feature store "+featureStoreName, e); -} - } + rerankingQueries = SolrQueryRequestContextUtils.getScoringQueries(req); - if (scoringQuery.getOriginalQuery() == null) { -scoringQuery.setOriginalQuery(context.getQuery()); + docsWereNotReranked = (rerankingQueries == null || rerankingQueries.length == 0); + if (docsWereNotReranked) { +rerankingQueries = new LTRScoringQuery[]{null}; } - if (scoringQuery.getFeatureLogger() == null){ -scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); - } - scoringQuery.setRequest(req); - - featureLogger = scoringQuery.getFeatureLogger(); + modelWeights = new LTRScoringQuery.ModelWeight[rerankingQueries.length]; + String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); + for (int i = 0; i < rerankingQueries.length; i++) { +LTRScoringQuery scoringQuery = rerankingQueries[i]; +if ((scoringQuery == null || !(scoringQuery instanceof OriginalRankingLTRScoringQuery)) && (docsWereNotReranked || (featureStoreName != null && !featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { Review comment: > ... I believe the code is much more readable now ... now that part is extremely clear. Yes, I agree, very nice. > ... another consideration sparkled: ... a separate Jira for that ... Interesting points, will need to think about them a bit (next week). I agree it's unrelated i.e. not a blocker for this pull request here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r522515743 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.ltr.interleaving; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.LinkedHashSet; +import java.util.Random; +import java.util.Set; + +import org.apache.lucene.search.ScoreDoc; + +/** + * Interleaving was introduced the first time by Joachims in [1, 2]. + * Team Draft Interleaving is among the most successful and used interleaving approaches[3]. + * Here the authors implement a method similar to the way in which captains select their players in team-matches. + * Team Draft Interleaving produces a fair distribution of ranking models’ elements in the final interleaved list. + * It has also proved to overcome an issue of the previous implemented approach, Balanced interleaving, in determining the winning model[4]. + * + * [1] T. Joachims. Optimizing search engines using clickthrough data. KDD (2002) + * [2] T.Joachims.Evaluatingretrievalperformanceusingclickthroughdata.InJ.Franke, G. Nakhaeizadeh, and I. Renz, editors, + * Text Mining, pages 79–96. Physica/Springer (2003) + * [3] F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect re- + * trieval quality? In CIKM, pages 43–52. ACM Press (2008) + * [4] O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue. + * Large-scale validation and analysis of interleaved search evaluation. ACM TOIS, 30(1):1–41, Feb. (2012) + */ +public class TeamDraftInterleaving implements Interleaving{ + public static Random RANDOM; + + static { +// We try to make things reproducible in the context of our tests by initializing the random instance +// based on the current seed +String seed = System.getProperty("tests.seed"); +if (seed == null) { + RANDOM = new Random(); +} else { + RANDOM = new Random(seed.hashCode()); +} + } + + /** + * Team Draft Interleaving considers two ranking models: modelA and modelB. + * For a given query, each model returns its ranked list of documents La = (a1,a2,...) and Lb = (b1, b2, ...). + * The algorithm creates a unique ranked list I = (i1, i2, ...). + * This list is created by interleaving elements from the two lists la and lb as described by Chapelle et al.[1]. + * Each element Ij is labelled TeamA if it is selected from La and TeamB if it is selected from Lb. + * + * [1] O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue. + * Large-scale validation and analysis of interleaved search evaluation. ACM TOIS, 30(1):1–41, Feb. (2012) + * + * Assumptions: + * - rerankedA and rerankedB has the same length. + * They contains the same search results, ranked differently by two ranking models + * - each reranked list can not contain the same search result more than once. + * + * @param rerankedA a ranked list of search results produced by a ranking model A + * @param rerankedB a ranked list of search results produced by a ranking model B + * @return the interleaved ranking list + */ + public InterleavingResult interleave(ScoreDoc[] rerankedA, ScoreDoc[] rerankedB) { +LinkedHashSet interleavedResults = new LinkedHashSet<>(); +ScoreDoc[] interleavedResultArray = new ScoreDoc[rerankedA.length]; +ArrayList> interleavingPicks = new ArrayList<>(2); +Set teamA = new HashSet<>(); +Set teamB = new HashSet<>(); +int topN = rerankedA.length; +int indexA = 0, indexB = 0; + +while (interleavedResults.size() < topN && indexA < rerankedA.length && indexB < rerankedB.length) { + if(teamA.size() interleaved, int index, ScoreDoc[] reranked) { +boolean foundElementToAdd = false; +while (index < reranked.length && !foundElementToAdd) { + ScoreDoc elementToCheck = reranked[index]; + if (interleaved.contains(elementToCheck)) { Review comment: > ... currently Interleaving doesn't support sharding ... Let's include that in the documentation somehow, e.g. https://github.com/cp
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r522486838 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ## @@ -208,55 +216,116 @@ public void setContext(ResultContext context) { if (threadManager != null) { threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); } - - // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { -// if store is set in the transformer we should overwrite the logger -final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); + LTRScoringQuery[] rerankingQueriesFromContext = SolrQueryRequestContextUtils.getScoringQueries(req); + docsWereNotReranked = (rerankingQueriesFromContext == null || rerankingQueriesFromContext.length == 0); + String transformerFeatureStore = SolrQueryRequestContextUtils.getFvStoreName(req); + Map transformerExternalFeatureInfo = LTRQParserPlugin.extractEFIParams(localparams); -final FeatureStore store = fr.getFeatureStore(featureStoreName); -featureStoreName = store.getName(); // if featureStoreName was null before this gets actual name - -try { - final LoggingModel lm = new LoggingModel(loggingModelName, - featureStoreName, store.getFeatures()); + initLoggingModel(transformerFeatureStore); Review comment: LTRFeatureLoggerTransformerFactory.2 - `loggingModel` being a member of the transformer factory gives it `SolrCore` lifetime/scope but here it's initialised based on per-request parameters. If multiple threads use the same transformer factory object concurrently then they might trampled upon each other. https://github.com/cpoerschke/lucene-solr/commit/4912daccd596435f5c61ac1a3cf86eaebb039118 proposes to not have the logging model as a member of the transformer factory. ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ## @@ -79,6 +81,7 @@ private char csvFeatureSeparator = CSVFeatureLogger.DEFAULT_FEATURE_SEPARATOR; private LTRThreadModule threadManager = null; + private LoggingModel loggingModel = null; Review comment: LTRFeatureLoggerTransformerFactory.1 - `loggingModel` is a member of of the transformer factory here. ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ## @@ -208,55 +216,116 @@ public void setContext(ResultContext context) { if (threadManager != null) { threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); } Review comment: LTRFeatureLoggerTransformerFactory.3 - I noted that `threadManager` here is an existing member of the transformer factory and it is initialised as part of request processing. Since there's no locking or anything there could be a chance that multiple threads concurrently call `threadManager.setExecutor()` but the argument to the set call is not specific to the request i.e. all requests would set the same thing (whereas for the logging model different requests could supply a different feature store name via the `fl=[feature store=...]` parameter). ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ## @@ -208,55 +216,116 @@ public void setContext(ResultContext context) { if (threadManager != null) { threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); } - - // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { -// if store is set in the transformer we should overwrite the logger -final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); + LTRScoringQuery[] rerankingQueriesFromContext = SolrQueryRequestContextUtils.getScoringQueries(req); + docsWereNotReranked = (rerankingQueriesFromContext == null || rerankingQueriesFromContext.le
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r519967315 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.ltr.interleaving; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.LinkedHashSet; +import java.util.Random; +import java.util.Set; + +import org.apache.lucene.search.ScoreDoc; + +/** + * Interleaving was introduced the first time by Joachims in [1, 2]. + * Team Draft Interleaving is among the most successful and used interleaving approaches[3]. + * Here the authors implement a method similar to the way in which captains select their players in team-matches. + * Team Draft Interleaving produces a fair distribution of ranking models’ elements in the final interleaved list. + * It has also proved to overcome an issue of the previous implemented approach, Balanced interleaving, in determining the winning model[4]. Review comment: ```suggestion * "Team draft interleaving" has also proved to overcome an issue of the "Balanced interleaving" approach, in determining the winning model[4]. ``` Suggest to avoid the "previous implemented approach" wording since it could be misinterpreted to mean that Solr previously had a `BalancedInterleaving` class. ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.ltr.interleaving; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.LinkedHashSet; +import java.util.Random; +import java.util.Set; + +import org.apache.lucene.search.ScoreDoc; + +/** + * Interleaving was introduced the first time by Joachims in [1, 2]. + * Team Draft Interleaving is among the most successful and used interleaving approaches[3]. + * Here the authors implement a method similar to the way in which captains select their players in team-matches. Review comment: ```suggestion * Team Draft Interleaving implements a method similar to the way in which captains select their players in team-matches. ``` ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.ltr.interleaving; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.LinkedHashSet; +import java.util.Random; +imp
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r519944462 ## File path: solr/solr-ref-guide/src/learning-to-rank.adoc ## @@ -247,6 +254,81 @@ The output XML will include feature values as a comma-separated list, resembling }} +=== Running a Rerank Query Interleaving Two Models + +To rerank the results of a query, interleaving two models (myModelA, myModelB) add the `rq` parameter to your search, passing two models in input, for example: + +[source,text] +http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score + +To obtain the model that interleaving picked for a search result, computed during reranking, add `[interleaving]` to the `fl` parameter, for example: Review comment: question: if myModelA had `[ doc1, doc2, doc3 ]` document order and myModelB had `[ doc1, doc3, doc2 ]` document order i.e. there was agreement between the models re: the first document, will `[interleaving]` return (1) randomly `myModelA` or `myModelB` depending on how the picking actually happened or will it return (2) something else e.g. `myModelA,myModelB` (if myModelA actually picked and myModelB agreed) or `myModelB,myModelA` (if myModelB actually picked and myModelA agreed) or will it return (3) neither since in a way neither of them picked the document since they both agreed on it? answer-ish: from recalling the implementation the answer is (1) i think though from a user's perspective perhaps it might be nice here to clarify here somehow around that? a subtle aspect being (if i understand things right) that `[features]` and `[interleaving]` could both be requested in the `fl` and whilst myModelA and myModelB might have agreed that `doc1` should be the first document they might have used very different features to arrived at that conclusion and their `score` value could also differ. ## File path: solr/solr-ref-guide/src/learning-to-rank.adoc ## @@ -247,6 +254,81 @@ The output XML will include feature values as a comma-separated list, resembling }} +=== Running a Rerank Query Interleaving Two Models + +To rerank the results of a query, interleaving two models (myModelA, myModelB) add the `rq` parameter to your search, passing two models in input, for example: + +[source,text] +http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score + +To obtain the model that interleaving picked for a search result, computed during reranking, add `[interleaving]` to the `fl` parameter, for example: + +[source,text] +http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score,[interleaving] + +The output XML will include the model picked for each search result, resembling the output shown here: + +[source,json] + +{ + "responseHeader":{ +"status":0, +"QTime":0, +"params":{ + "q":"test", + "fl":"id,score,[interleaving]", + "rq":"{!ltr model=myModelA model=myModelB reRankDocs=100}"}}, + "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[ + { +"id":"GB18030TEST", +"score":1.0005897, +"[interleaving]":"myModelB"}, + { +"id":"UTF8TEST", +"score":0.79656565, +"[interleaving]":"myModelA"}] + }} + + +=== Running a Rerank Query Interleaving a model with the original ranking +When approaching Search Quality Evaluation with interleaving it may be useful to compare a model with the original ranking. +To rerank the results of a query, interleaving a model with the original ranking, add the `rq` parameter to your search, with a model in input and activating the original ranking interleaving, for example: + + +[source,text] +http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel model=_OriginalRanking_ reRankDocs=100}&fl=id,score Review comment: subjective: might `model=_OriginalRanking_ model=myModel` be more intuitive i.e. the 'from' baseline model on the left and the 'to' alternative model on the right? (i recall that the code had an "original ranking last" assumption before but if that's gone there's a possibility here to swap the order) ## File path: solr/solr-ref-guide/src/learning-to-rank.adoc ## @@ -418,6 +500,14 @@ Learning-To-Rank is a contrib module and therefore its plugins must be configure +* Declaration of the `[interleaving]` transformer. ++ +[source,xml] + + + Review comment: minor/subjective: could shorten since there's no parameters ``` ``` ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518725166 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.ltr.interleaving; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.LinkedHashSet; +import java.util.Random; +import java.util.Set; + +import org.apache.lucene.search.ScoreDoc; + +public class TeamDraftInterleaving implements Interleaving{ + public static Random RANDOM; + + static { +// We try to make things reproducible in the context of our tests by initializing the random instance +// based on the current seed +String seed = System.getProperty("tests.seed"); Review comment: https://github.com/cpoerschke/lucene-solr/commit/63a190b57fdc892b98c546f005edbc096e4e4095 is part 2 of 2: it removes the tests.seed System property use in TeamDraftInterleaving by passing a Random object into TeamDraftInterleaving. instead the tests.seed System property use is in a test class (LTRQParserTestPlugin) and tests continue to make their `setRANDOM` calls but instead of `TeamDraftInterleaving.setRANDOM` it's now `LTRQParserTestPlugin.setRANDOM` then. Two problems with this though: * a forbidden apis check fails (saying that RandomizedRunner's random() should be used instead but i'm unclear still on how that might be done) * the tests no longer pass, which baffles me though haven't looked at details yet; speculations: * was the randomness in the tests predictable before and now it's predictable also but the number sequence has changed and test expectations need to be adjusted to match? * something to do with order of system property setting and non-test/test class loading perhaps and perhaps use of the RandomizedRunner's random would solve it somehow? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518659433 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.ltr.interleaving; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.LinkedHashSet; +import java.util.Random; +import java.util.Set; + +import org.apache.lucene.search.ScoreDoc; + +public class TeamDraftInterleaving implements Interleaving{ + public static Random RANDOM; + + static { +// We try to make things reproducible in the context of our tests by initializing the random instance +// based on the current seed +String seed = System.getProperty("tests.seed"); Review comment: Yes, predictable randomness in tests can be tricky! I had an (unplanned) idea around this at breakfast this morning, trying it out now: https://github.com/cpoerschke/lucene-solr/commit/5fc99898f9bc82f4dabcfe188e132ba2bc727704 is part 1 of 2 (and is independent technically speaking). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518364159 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java ## @@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, SolrParams params, @Override public Query parse() throws SyntaxError { // ReRanking Model - final String modelName = localParams.get(LTRQParserPlugin.MODEL); - if ((modelName == null) || modelName.isEmpty()) { + final String[] modelNames = localParams.getParams(LTRQParserPlugin.MODEL); + if ((modelNames == null) || modelNames.length==0 || modelNames[0].isEmpty()) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Must provide model in the request"); } - - final LTRScoringModel ltrScoringModel = mr.getModel(modelName); - if (ltrScoringModel == null) { -throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, -"cannot find " + LTRQParserPlugin.MODEL + " " + modelName); - } - - final String modelFeatureStoreName = ltrScoringModel.getFeatureStoreName(); - final boolean extractFeatures = SolrQueryRequestContextUtils.isExtractingFeatures(req); - final String fvStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - // Check if features are requested and if the model feature store and feature-transform feature store are the same - final boolean featuresRequestedFromSameStore = (modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? extractFeatures:false; - if (threadManager != null) { - threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); - } - final LTRScoringQuery scoringQuery = new LTRScoringQuery(ltrScoringModel, - extractEFIParams(localParams), - featuresRequestedFromSameStore, threadManager); - - // Enable the feature vector caching if we are extracting features, and the features - // we requested are the same ones we are reranking with - if (featuresRequestedFromSameStore) { -scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); + + LTRScoringQuery[] rerankingQueries = new LTRScoringQuery[modelNames.length]; + for (int i = 0; i < modelNames.length; i++) { +final LTRScoringQuery rerankingQuery; +if (!ORIGINAL_RANKING.equals(modelNames[i])) { + final LTRScoringModel ltrScoringModel = mr.getModel(modelNames[i]); + if (ltrScoringModel == null) { +throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, +"cannot find " + LTRQParserPlugin.MODEL + " " + modelNames[i]); + } + final String modelFeatureStoreName = ltrScoringModel.getFeatureStoreName(); + final boolean extractFeatures = SolrQueryRequestContextUtils.isExtractingFeatures(req); + final String fvStoreName = SolrQueryRequestContextUtils.getFvStoreName(req);// Check if features are requested and if the model feature store and feature-transform feature store are the same + final boolean featuresRequestedFromSameStore = (modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? extractFeatures : false; + if (threadManager != null) { + threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); + } + rerankingQuery = new LTRScoringQuery(ltrScoringModel, + extractEFIParams(localParams), + featuresRequestedFromSameStore, threadManager); + + // Enable the feature vector caching if we are extracting features, and the features + // we requested are the same ones we are reranking with + if (featuresRequestedFromSameStore) { +rerankingQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); + } +}else{ + rerankingQuery = new LTRScoringQuery(null); +} + +// External features +rerankingQuery.setRequest(req); +rerankingQueries[i] = rerankingQuery; } - SolrQueryRequestContextUtils.setScoringQuery(req, scoringQuery); + SolrQueryRequestContextUtils.setScoringQuery(req, rerankingQueries); int reRankDocs = localParams.getInt(RERANK_DOCS, DEFAULT_RERANK_DOCS); if (reRankDocs <= 0) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, - "Must rerank at least 1 document"); +"Must rerank at least 1 document"); + } + if (rerankingQueries.length == 1) { +return new LTRQuery(rerankingQueries[0], reRankDocs); + } else { +return new LTRQuery(rerankingQueries, reRankDocs); } - - // External features - scoringQuery.setRequest(req); - - return new LTRQuery
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518217755 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ## @@ -210,50 +216,59 @@ public void setContext(ResultContext context) { } // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { -// if store is set in the transformer we should overwrite the logger - -final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); - -final FeatureStore store = fr.getFeatureStore(featureStoreName); -featureStoreName = store.getName(); // if featureStoreName was null before this gets actual name - -try { - final LoggingModel lm = new LoggingModel(loggingModelName, - featureStoreName, store.getFeatures()); - - scoringQuery = new LTRScoringQuery(lm, - LTRQParserPlugin.extractEFIParams(localparams), - true, - threadManager); // request feature weights to be created for all features - -}catch (final Exception e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, - "retrieving the feature store "+featureStoreName, e); -} - } + rerankingQueries = SolrQueryRequestContextUtils.getScoringQueries(req); - if (scoringQuery.getOriginalQuery() == null) { -scoringQuery.setOriginalQuery(context.getQuery()); + docsWereNotReranked = (rerankingQueries == null || rerankingQueries.length == 0); + if (docsWereNotReranked) { +rerankingQueries = new LTRScoringQuery[]{null}; } - if (scoringQuery.getFeatureLogger() == null){ -scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); - } - scoringQuery.setRequest(req); - - featureLogger = scoringQuery.getFeatureLogger(); + modelWeights = new LTRScoringQuery.ModelWeight[rerankingQueries.length]; + String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); + for (int i = 0; i < rerankingQueries.length; i++) { +LTRScoringQuery scoringQuery = rerankingQueries[i]; +if ((scoringQuery == null || !(scoringQuery instanceof OriginalRankingLTRScoringQuery)) && (docsWereNotReranked || (featureStoreName != null && !featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { Review comment: 12/n observations/thoughts/questions: Most tricky to articulate, hence left until last. Prior to interleaving the existing logic is that if feature vectors are requested and there is no model (or the model is for a different feature store) then a logging model is created. So now then if we have two models: * if both models are for the requested feature store then that's great and each document would have been picked by one of the models and so we use the feature vector already previously calculated by whatever model had picked the document. * if neither model is for the requested feature store then we need to create a logging model, is one logging model sufficient or do we need two? intuitively to me one would seem to be sufficient but that's based on partial analysis only so far. * if one of the two models (modelA) is for the requested feature store then for the documents picked by modelA we can use the feature vector already previously calculated by modelA. what about documents picked by modelB? it could be that modelA actually has the feature vector for that document but that modelB simply managed to pick the document first. or if modelA does not have the feature vector then we could calculate it for modelA. would a logging model help in this scenario? intuitively to me it would seem that calculating the missing feature vector via modelA or via the logging model would both be equally efficient and hence no logging model would be needed but again that's only based on partial analysis so far. ## File path: solr/solr-ref-guide/src/learning-to-rank.adoc ## @@ -247,6 +254,81 @@ The output XML will include feature values as a comma-separated list, resembling }} +=== Running a Rerank Query Interleaving Two Models + +To rerank the results of a query, interleaving two models (myModelA, myModelB) add the `rq` parameter to your search, passing two models in input, for example: + +[source,text] +http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score + +To obt
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518156348 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRInterleavingRescorer.java ## @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.ltr; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Set; + +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.search.Explanation; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.ScoreDoc; +import org.apache.lucene.search.ScoreMode; +import org.apache.lucene.search.TopDocs; +import org.apache.solr.ltr.interleaving.Interleaving; +import org.apache.solr.ltr.interleaving.InterleavingResult; +import org.apache.solr.ltr.interleaving.TeamDraftInterleaving; + +/** + * Implements the rescoring logic. The top documents returned by solr with their + * original scores, will be processed by a {@link LTRScoringQuery} that will assign a + * new score to each document. The top documents will be resorted based on the + * new score. + * */ +public class LTRInterleavingRescorer extends LTRRescorer { + + LTRScoringQuery[] rerankingQueries; + Interleaving interleavingAlgorithm = new TeamDraftInterleaving(); Review comment: 11.1/n If we go with 4/n then `LTRQParserPlugin.LTRQParser` could pass a `Interleaving interleavingAlgorithm` argument to the `LTRInterleavingQuery` constructor which could pass it to the `LTRInterleavingRescorer` constructor. For now `TeamDraftInterleaving` would be the only supported algorithm but in future other algorithms could then easily be added e.g. based on an additional `ltr` parameter. What do you think? An easy change to make now or something better left for later? ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.ltr.interleaving; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.LinkedHashSet; +import java.util.Random; +import java.util.Set; + +import org.apache.lucene.search.ScoreDoc; + +public class TeamDraftInterleaving implements Interleaving{ + public static Random RANDOM; + + static { +// We try to make things reproducible in the context of our tests by initializing the random instance +// based on the current seed +String seed = System.getProperty("tests.seed"); Review comment: 11.2/n `LTRQParserPlugin.LTRQParser` also has access to the `SolrQueryRequest` and its `SolrCore` object. For some reason I thought that within that some 'official' source of random-ness might be available which could be passed to a `TeamDraftInterleaving(Random)` constructor. And I imagined that our test harnesses would use seeds to make tests reproducible w.r.t. that 'official' source of random-ness. There however doesn't appear to be such a source of non-test official random-ness? `System.getProperty("tests.seed");` being used/available to non-test code seems potentially tricky. @dweiss would you perhaps have any insights around non-test sources of randomness? This is an automated message from the Apache Git Service. To respond to the message, p
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518035475 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRInterleavingRescorer.java ## @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.ltr; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Set; + +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.search.Explanation; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.ScoreDoc; +import org.apache.lucene.search.ScoreMode; +import org.apache.lucene.search.TopDocs; +import org.apache.solr.ltr.interleaving.Interleaving; +import org.apache.solr.ltr.interleaving.InterleavingResult; +import org.apache.solr.ltr.interleaving.TeamDraftInterleaving; + +import static org.apache.solr.ltr.search.LTRQParserPlugin.isOriginalRanking; + +/** + * Implements the rescoring logic. The top documents returned by solr with their + * original scores, will be processed by a {@link LTRScoringQuery} that will assign a + * new score to each document. The top documents will be resorted based on the + * new score. + * */ +public class LTRInterleavingRescorer extends LTRRescorer { + + LTRScoringQuery[] rerankingQueries; + Interleaving interleavingAlgorithm = new TeamDraftInterleaving(); + + public LTRInterleavingRescorer(LTRScoringQuery[] rerankingQueries) { +this.rerankingQueries = rerankingQueries; + } + + /** + * rescores the documents: + * + * @param searcher + * current IndexSearcher + * @param firstPassTopDocs + * documents to rerank; + * @param topN + * documents to return; + */ + @Override + public TopDocs rescore(IndexSearcher searcher, TopDocs firstPassTopDocs, + int topN) throws IOException { +if ((topN == 0) || (firstPassTopDocs.scoreDocs.length == 0)) { + return firstPassTopDocs; +} + +int originalRankingIndex = -1; Review comment: 10/n suggestions: * calculate originalRankingIndex in the constructor * use originalRankingIndex to minimise "rerankingQueries[i] is not original ranking" checks * use originalRankingIndex to remove the "original ranking query is always the last query" assumption * the 'rerank' method already takes a `searcher` and so it could determine its own `leaves` from that https://github.com/cpoerschke/lucene-solr/commit/002c31cceaeff411ad35111f00b7cee71c5b11de This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r517960447 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRRescorer.java ## @@ -166,64 +186,77 @@ public void scoreFeatures(IndexSearcher indexSearcher, TopDocs firstPassTopDocs, docBase = readerContext.docBase; scorer = modelWeight.scorer(readerContext); } - // Scorer for a LTRScoringQuery.ModelWeight should never be null since we always have to - // call score - // even if no feature scorers match, since a model might use that info to - // return a - // non-zero score. Same applies for the case of advancing a LTRScoringQuery.ModelWeight.ModelScorer - // past the target - // doc since the model algorithm still needs to compute a potentially - // non-zero score from blank features. - assert (scorer != null); - final int targetDoc = docID - docBase; - scorer.docID(); - scorer.iterator().advance(targetDoc); - - scorer.getDocInfo().setOriginalDocScore(hit.score); - hit.score = scorer.score(); - if (hitUpto < topN) { -reranked[hitUpto] = hit; -// if the heap is not full, maybe I want to log the features for this -// document + scoreSingleHit(indexSearcher, topN, modelWeight, docBase, hitUpto, hit, docID, scoringQuery, scorer, reranked); Review comment: 9/n observation and suggestions: * very elegant factoring out of methods for use by `LTRInterleavingRescorer` * the `rerank` method already takes a `searcher` and so it could determine its own `leaves` from that * `this.scoringQuery` being passed to the `scoreSingleHit` method as an argument (rather than it using the `this.scoringQuery` directly) is very subtle. it is of course present as an argument because `LTRInterleavingRescorer` will passing `rerankingQueries[i]` for that argument. the subtlety could be removed by making `scoreSingleHit` a static method. https://github.com/cpoerschke/lucene-solr/commit/16512dbddbabe12ca5a2a26a9180ceeaae62cea2 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r517889175 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java ## @@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, SolrParams params, @Override public Query parse() throws SyntaxError { Review comment: 7/n minor comments: * technically not just `modelNames[0]` but all the model names could be `isEmpty` checked * `threadManager.setExecutor` (already mentioned in 6/n) need not be inside the loop, likewise `extractFeatures` and `fvStoreName` and `extractEFIParams(localParams)` could move out. ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.ltr.interleaving; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.LinkedHashSet; +import java.util.Random; +import java.util.Set; + +import org.apache.lucene.search.ScoreDoc; + +public class TeamDraftInterleaving implements Interleaving{ Review comment: 8/n suggestions: * class level javadocs re: the interleaving algorithm * comments or javadocs re: any assumptions e.g. * must rerankedA.length and rerankedB.length match? * can rerankedA and rerankedB contain the same docs? * can rerankedA contain the same doc more than once? * can rerankedB contain the same doc more than once? * consider guarding against array-index-out-of-bounds exceptions (even if they shouldn't happen if all assumptions are met) ``` indexA = updateIndex(interleavedResults,indexA,rerankedA); if (indexA < rerankedA.length) { interleavedResults.add(rerankedA[indexA]); teamA.add(rerankedA[indexA].doc); indexA++; } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r517514900 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRScoringQuery.java ## @@ -73,6 +74,8 @@ final private Map efi; // Original solr query used to fetch matching documents private Query originalQuery; + // Model was picked for this Docs + private Set pickedInterleavingDocIds; Review comment: 4.1/n The addition of a new member here which is only sometimes applicable got me wondering about possibly having `LTRInterleavingScoringQuery extends LTRScoringQuery` inheritance. ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java ## @@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, SolrParams params, @Override public Query parse() throws SyntaxError { // ReRanking Model - final String modelName = localParams.get(LTRQParserPlugin.MODEL); - if ((modelName == null) || modelName.isEmpty()) { + final String[] modelNames = localParams.getParams(LTRQParserPlugin.MODEL); + if ((modelNames == null) || modelNames.length==0 || modelNames[0].isEmpty()) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Must provide model in the request"); } - - final LTRScoringModel ltrScoringModel = mr.getModel(modelName); - if (ltrScoringModel == null) { -throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, -"cannot find " + LTRQParserPlugin.MODEL + " " + modelName); - } - - final String modelFeatureStoreName = ltrScoringModel.getFeatureStoreName(); - final boolean extractFeatures = SolrQueryRequestContextUtils.isExtractingFeatures(req); - final String fvStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - // Check if features are requested and if the model feature store and feature-transform feature store are the same - final boolean featuresRequestedFromSameStore = (modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? extractFeatures:false; - if (threadManager != null) { - threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); - } - final LTRScoringQuery scoringQuery = new LTRScoringQuery(ltrScoringModel, - extractEFIParams(localParams), - featuresRequestedFromSameStore, threadManager); - - // Enable the feature vector caching if we are extracting features, and the features - // we requested are the same ones we are reranking with - if (featuresRequestedFromSameStore) { -scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); + + LTRScoringQuery[] rerankingQueries = new LTRScoringQuery[modelNames.length]; + for (int i = 0; i < modelNames.length; i++) { +final LTRScoringQuery rerankingQuery; +if (!ORIGINAL_RANKING.equals(modelNames[i])) { + final LTRScoringModel ltrScoringModel = mr.getModel(modelNames[i]); + if (ltrScoringModel == null) { +throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, +"cannot find " + LTRQParserPlugin.MODEL + " " + modelNames[i]); + } + final String modelFeatureStoreName = ltrScoringModel.getFeatureStoreName(); + final boolean extractFeatures = SolrQueryRequestContextUtils.isExtractingFeatures(req); + final String fvStoreName = SolrQueryRequestContextUtils.getFvStoreName(req);// Check if features are requested and if the model feature store and feature-transform feature store are the same + final boolean featuresRequestedFromSameStore = (modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? extractFeatures : false; + if (threadManager != null) { + threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); + } + rerankingQuery = new LTRScoringQuery(ltrScoringModel, + extractEFIParams(localParams), + featuresRequestedFromSameStore, threadManager); + + // Enable the feature vector caching if we are extracting features, and the features + // we requested are the same ones we are reranking with + if (featuresRequestedFromSameStore) { +rerankingQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); + } +}else{ + rerankingQuery = new LTRScoringQuery(null); +} + +// External features +rerankingQuery.setRequest(req); +rerankingQueries[i] = rerankingQuery; } - SolrQueryRequestContextUtils.setScoringQuery(req, scoringQuery); + SolrQueryRequestContextUtils.setScoringQuery(req, rerankingQueries); int reRankDocs = localParams.getInt(RERAN
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r517502678 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java ## @@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, SolrParams params, @Override public Query parse() throws SyntaxError { // ReRanking Model - final String modelName = localParams.get(LTRQParserPlugin.MODEL); - if ((modelName == null) || modelName.isEmpty()) { + final String[] modelNames = localParams.getParams(LTRQParserPlugin.MODEL); + if ((modelNames == null) || modelNames.length==0 || modelNames[0].isEmpty()) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Must provide model in the request"); } - - final LTRScoringModel ltrScoringModel = mr.getModel(modelName); - if (ltrScoringModel == null) { -throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, -"cannot find " + LTRQParserPlugin.MODEL + " " + modelName); - } - - final String modelFeatureStoreName = ltrScoringModel.getFeatureStoreName(); - final boolean extractFeatures = SolrQueryRequestContextUtils.isExtractingFeatures(req); - final String fvStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - // Check if features are requested and if the model feature store and feature-transform feature store are the same - final boolean featuresRequestedFromSameStore = (modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? extractFeatures:false; - if (threadManager != null) { - threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); - } - final LTRScoringQuery scoringQuery = new LTRScoringQuery(ltrScoringModel, - extractEFIParams(localParams), - featuresRequestedFromSameStore, threadManager); - - // Enable the feature vector caching if we are extracting features, and the features - // we requested are the same ones we are reranking with - if (featuresRequestedFromSameStore) { -scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); + + LTRScoringQuery[] rerankingQueries = new LTRScoringQuery[modelNames.length]; + for (int i = 0; i < modelNames.length; i++) { +final LTRScoringQuery rerankingQuery; +if (!ORIGINAL_RANKING.equals(modelNames[i])) { Review comment: Hi @alessandrobenedetti! Apologies for not returning to here sooner. Good point about Apache Solr already using "special names" in various places, sure, let's go with "_OriginalRanking_" as the special name then. Hmm, the UI treats underscore special here, so `"_OriginalRanking_"` is what's in the code currently. > ... we could move the review to the re-scoring part, that is more delicate ... "delicate" is an excellent word choice, thank you, I spent some pleasant time today continuing to learn more about the re-scoring part. Took a sort of outside-to-inside approach i.e. starting at `LTRQParserPlugin` and then looking at what changed and how the bits fit together. If it's okay with you I'll push a branch to my fork and add comments/questions sometimes with commit links here, not quite sure if that might work, let's try? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r498948018 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java ## @@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, SolrParams params, @Override public Query parse() throws SyntaxError { // ReRanking Model - final String modelName = localParams.get(LTRQParserPlugin.MODEL); - if ((modelName == null) || modelName.isEmpty()) { + final String[] modelNames = localParams.getParams(LTRQParserPlugin.MODEL); + if ((modelNames == null) || modelNames.length==0 || modelNames[0].isEmpty()) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Must provide model in the request"); } - - final LTRScoringModel ltrScoringModel = mr.getModel(modelName); - if (ltrScoringModel == null) { -throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, -"cannot find " + LTRQParserPlugin.MODEL + " " + modelName); - } - - final String modelFeatureStoreName = ltrScoringModel.getFeatureStoreName(); - final boolean extractFeatures = SolrQueryRequestContextUtils.isExtractingFeatures(req); - final String fvStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - // Check if features are requested and if the model feature store and feature-transform feature store are the same - final boolean featuresRequestedFromSameStore = (modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? extractFeatures:false; - if (threadManager != null) { - threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); - } - final LTRScoringQuery scoringQuery = new LTRScoringQuery(ltrScoringModel, - extractEFIParams(localParams), - featuresRequestedFromSameStore, threadManager); - - // Enable the feature vector caching if we are extracting features, and the features - // we requested are the same ones we are reranking with - if (featuresRequestedFromSameStore) { -scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); + + LTRScoringQuery[] rerankingQueries = new LTRScoringQuery[modelNames.length]; + for (int i = 0; i < modelNames.length; i++) { +final LTRScoringQuery rerankingQuery; +if (!ORIGINAL_RANKING.equals(modelNames[i])) { Review comment: Ah, good point about the special "OriginalRanking" also appearing in the "[interleaving]" transformer! When using interleaving there's always at least two models to be interleaved, right? The models could all be actual models or one of them could be the "OriginalRanking" pseudo-model. I wonder if class inheritance might help us e.g. ``` class InterleavingLTRQParserPlugin extends LTRQParserPlugin ``` and (say) ``` ``` where ``` rq={!iltr model=myModelXYZ} ``` can be a convenience equivalent to ``` rq={!iltr model=OriginalRanking model=myModelXYZ} ``` i.e. if only one model is supplied then it is implied that the second model is the original ranking. And if the special "OriginalRanking" name doesn't suit someone (either because they already have a real model that happens to be called "OriginalRanking" or because they would prefer a different descriptor in the "[interleaving]" transformer output) then something like ``` rq={!iltr originalRankingModel=noModel model=noModel model=someModel} ``` would allow them to call the "OriginalRanking" something else e.g. "noModel" instead. We could even reject any "OriginalRanking" models that are actual models via something like ``` final String originalRankingModelName = localParams.get("originalRankingModel", "OriginalRanking" /* default */); if (null != mr.getModel(originalRankingModelName)) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Found an actual '" + originalRankingModelName + "' model, please ... } ``` However, the "[interleaving]" transformer still needs to know about the special name, hmm. At the moment we have ``` public static boolean isOriginalRanking(LTRScoringQuery rerankingQuery){ return rerankingQuery.getScoringModel() == null; } ``` in LTRQParserPlugin and in LTRInterleavingTransformerFactory ``` if (isOriginalRanking(rerankingQuery)) { doc.addField(name, ORIGINAL_RANKING); } else { doc.addField(name, rerankingQuery.getScoringModel().getName()); } ``` and if we gave LTRScoringQuery a getScoringModelName method ``` public String getScoringModelName() { return ltrScoringModel.getName(); } ```
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r462456864 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java ## @@ -146,93 +149,114 @@ public LTRQParser(String qstr, SolrParams localParams, SolrParams params, @Override public Query parse() throws SyntaxError { // ReRanking Model - final String modelName = localParams.get(LTRQParserPlugin.MODEL); - if ((modelName == null) || modelName.isEmpty()) { + final String[] modelNames = localParams.getParams(LTRQParserPlugin.MODEL); + if ((modelNames == null) || modelNames.length==0 || modelNames[0].isEmpty()) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Must provide model in the request"); } - - final LTRScoringModel ltrScoringModel = mr.getModel(modelName); - if (ltrScoringModel == null) { -throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, -"cannot find " + LTRQParserPlugin.MODEL + " " + modelName); - } - - final String modelFeatureStoreName = ltrScoringModel.getFeatureStoreName(); - final boolean extractFeatures = SolrQueryRequestContextUtils.isExtractingFeatures(req); - final String fvStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - // Check if features are requested and if the model feature store and feature-transform feature store are the same - final boolean featuresRequestedFromSameStore = (modelFeatureStoreName.equals(fvStoreName) || fvStoreName == null) ? extractFeatures:false; - if (threadManager != null) { - threadManager.setExecutor(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); - } - final LTRScoringQuery scoringQuery = new LTRScoringQuery(ltrScoringModel, - extractEFIParams(localParams), - featuresRequestedFromSameStore, threadManager); - - // Enable the feature vector caching if we are extracting features, and the features - // we requested are the same ones we are reranking with - if (featuresRequestedFromSameStore) { -scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); + + LTRScoringQuery[] rerankingQueries = new LTRScoringQuery[modelNames.length]; + for (int i = 0; i < modelNames.length; i++) { +final LTRScoringQuery rerankingQuery; +if (!ORIGINAL_RANKING.equals(modelNames[i])) { Review comment: The model name `originalRanking` is being given a special meaning here. I wonder if perhaps the differences between models could be transferred to the parameter names somehow (e.g. a new `original_model` parameter name alongside the existing `model` parameter name)? Then users could choose any model name they like, including for "original ranking" purposes. https://github.com/apache/lucene-solr/pull/1705 proposes to factor out a `LTRQParserPlugin.newLTRScoringQuery` method (but I haven't yet explored fully w.r.t. how that might connect up here w.r.t. additional parameter names). What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org