Re: Hybrid scoring lexical / vector

Joel Bernstein Thu, 01 Jun 2023 07:55:11 -0700

I'll also add an implementation for RRF to the ReRanker.

https://weaviate.io/blog/hybrid-search-explained




Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, May 26, 2023 at 4:58 PM Joel Bernstein <joels...@gmail.com> wrote:

> I'm going to create a ticket for adding Min/Max scaling to the ReRanker.
> The ReRanker has access to all the topDocs so it should be pretty
> straightforward to min/max scale all the topDocs before ReRanking the topN.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, May 25, 2023 at 5:18 AM Alessandro Benedetti <a.benede...@sease.io>
> wrote:
>
>> Hi all,
>> our approach to providing hybrid search in Solr has been focused on the
>> reranking side, specifically enabling vector-based features in Learning To
>> Rank.
>> In this way, you can combine lexical features (such as the original BM25
>> score) with various vector distances (in more than one field if you like)
>> and other factors using whatever model is supported (linear, tree-based,
>> neural network)
>> To do first-stage hybrid retrieval, that should be already decently
>> available through the boolean query parser.
>>
>> We started the work with function queries (that unfortunately are
>> scattered across Lucene and Solr, and now that the projects are separate
>> again, it's a lengthy process to go with.
>> Our first step is almost ready:
>> https://github.com/apache/lucene/pull/12253
>> Any feedback is welcome!
>>
>> Then regarding the different problem of having an unbound relevance score
>> in Lucene/Solr, I agree that can (and should) be improved, I would love to
>> see it as a probabilistic score, but I imagine that making this change in
>> Lucene will cause an enormous discussion, probably ending in stand-still?
>> You have my support!
>>
>>
>> --------------------------
>> *Alessandro Benedetti*
>> Director @ Sease Ltd.
>> *Apache Lucene/Solr Committer*
>> *Apache Solr PMC Member*
>>
>> e-mail: a.benede...@sease.io
>>
>>
>> *Sease* - Information Retrieval Applied
>> Consulting | Training | Open Source
>>
>> Website: Sease.io <http://sease.io/>
>> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
>> <https://twitter.com/seaseltd> | Youtube
>> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
>> <https://github.com/seaseltd>
>>
>>
>> On Tue, 23 May 2023 at 19:17, Mikhail Khludnev <m...@apache.org> wrote:
>>
>> > Hello, Joel.
>> >
>> > Here's my idea
>> > https://lists.apache.org/thread/6t45p5fk4hldrt1833kvrbobdd2pk265
>> >
>> >
>> > On Tue, May 23, 2023 at 6:20 PM Joel Bernstein <joels...@gmail.com>
>> wrote:
>> >
>> > > One of the things that I'm focusing on is combining the Solr
>> similarity
>> > > score with the vector score in a consistent manner. My main concern is
>> > > dealing with the unbounded nature of the Solr similarity score and
>> how to
>> > > balance that with a vector score.
>> > >
>> > > So my first question are there any mechanisms now to scale or squash
>> the
>> > > Solr similarity score before combining with a vector score?
>> > >
>> > > Below are two ideas I have for squashing / scaling the score:
>> > >
>> > > 1) SquashingScoreQuery. This is a wrapper query that squashes the
>> score
>> > of
>> > > its wrapped query using a sigmoid function.
>> > >
>> > > 2) Min/Max scale the main query score in the ReRanker. This simply
>> adds a
>> > > flag to the ReRanker to min/max scale the main query scores before
>> > > combining with the ReRank query.
>> > >
>> > > Do others have thoughts on this?
>> > >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> > https://t.me/MUST_SEARCH
>> > A caveat: Cyrillic!
>> >
>>
>

Re: Hybrid scoring lexical / vector

Reply via email to