I'll also add an implementation for RRF to the ReRanker. https://weaviate.io/blog/hybrid-search-explained
Joel Bernstein http://joelsolr.blogspot.com/ On Fri, May 26, 2023 at 4:58 PM Joel Bernstein <joels...@gmail.com> wrote: > I'm going to create a ticket for adding Min/Max scaling to the ReRanker. > The ReRanker has access to all the topDocs so it should be pretty > straightforward to min/max scale all the topDocs before ReRanking the topN. > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Thu, May 25, 2023 at 5:18 AM Alessandro Benedetti <a.benede...@sease.io> > wrote: > >> Hi all, >> our approach to providing hybrid search in Solr has been focused on the >> reranking side, specifically enabling vector-based features in Learning To >> Rank. >> In this way, you can combine lexical features (such as the original BM25 >> score) with various vector distances (in more than one field if you like) >> and other factors using whatever model is supported (linear, tree-based, >> neural network) >> To do first-stage hybrid retrieval, that should be already decently >> available through the boolean query parser. >> >> We started the work with function queries (that unfortunately are >> scattered across Lucene and Solr, and now that the projects are separate >> again, it's a lengthy process to go with. >> Our first step is almost ready: >> https://github.com/apache/lucene/pull/12253 >> Any feedback is welcome! >> >> Then regarding the different problem of having an unbound relevance score >> in Lucene/Solr, I agree that can (and should) be improved, I would love to >> see it as a probabilistic score, but I imagine that making this change in >> Lucene will cause an enormous discussion, probably ending in stand-still? >> You have my support! >> >> >> -------------------------- >> *Alessandro Benedetti* >> Director @ Sease Ltd. >> *Apache Lucene/Solr Committer* >> *Apache Solr PMC Member* >> >> e-mail: a.benede...@sease.io >> >> >> *Sease* - Information Retrieval Applied >> Consulting | Training | Open Source >> >> Website: Sease.io <http://sease.io/> >> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter >> <https://twitter.com/seaseltd> | Youtube >> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github >> <https://github.com/seaseltd> >> >> >> On Tue, 23 May 2023 at 19:17, Mikhail Khludnev <m...@apache.org> wrote: >> >> > Hello, Joel. >> > >> > Here's my idea >> > https://lists.apache.org/thread/6t45p5fk4hldrt1833kvrbobdd2pk265 >> > >> > >> > On Tue, May 23, 2023 at 6:20 PM Joel Bernstein <joels...@gmail.com> >> wrote: >> > >> > > One of the things that I'm focusing on is combining the Solr >> similarity >> > > score with the vector score in a consistent manner. My main concern is >> > > dealing with the unbounded nature of the Solr similarity score and >> how to >> > > balance that with a vector score. >> > > >> > > So my first question are there any mechanisms now to scale or squash >> the >> > > Solr similarity score before combining with a vector score? >> > > >> > > Below are two ideas I have for squashing / scaling the score: >> > > >> > > 1) SquashingScoreQuery. This is a wrapper query that squashes the >> score >> > of >> > > its wrapped query using a sigmoid function. >> > > >> > > 2) Min/Max scale the main query score in the ReRanker. This simply >> adds a >> > > flag to the ReRanker to min/max scale the main query scores before >> > > combining with the ReRank query. >> > > >> > > Do others have thoughts on this? >> > > >> > >> > >> > -- >> > Sincerely yours >> > Mikhail Khludnev >> > https://t.me/MUST_SEARCH >> > A caveat: Cyrillic! >> > >> >