Hi Mikhail,
I was thinking again about this regarding Hybrid Search in Solr and the
current
https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#scale-function
.
Was there any progress on this? Any traction?
Sooner or later I hope to get some funds to work on this, I keep you
updated!
I agree this would be useful in Learning To Rank and Hybrid Search in
general.
The current original score feature is unlikely to be useful if not
normalised per an estimated maximum score.

Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Mon, 13 Feb 2023 at 12:47, Mikhail Khludnev <m...@apache.org> wrote:

> Hello.
> Just FYI. I scratched a little prototype
> https://github.com/mkhludnev/likely/blob/main/src/test/java/org/apache/lucene/contrb/highly/TestLikelyReader.java#L53
> To estimate maximum possible score for the query against an index:
>  - it creates a virtual index (LikelyReader), which
>  - contains all terms from the original index with the same docCount
>  - matching all of these terms in the first doc (docnum=0) with the
> maximum termFreq (which estimating is a separate question).
> So, if we search over this LikelyReader we get a score estimate, which can
> hardly be exceeded by the same query over the original index.
> I suppose this might be useful for LTR as a better alternative to the
> query score feature.
>
> On Tue, Dec 6, 2022 at 10:02 AM Mikhail Khludnev <m...@apache.org> wrote:
>
>> Hello dev!
>> Users are interested in the meaning of absolute value of the score, but
>> we always reply that it's just relative value. Maximum score of matched
>> docs is not an answer.
>> Ultimately we need to measure how much sense a query has in the index.
>> e.g. [jet OR propulsion OR spider] query should be measured like
>> nonsense, because the best matching docs have much lower scores than
>> hypothetical (and assuming absent) doc matching [jet AND propulsion AND
>> spider].
>> Could it be a method that returns the maximum possible score if all query
>> terms would match. Something like stubbing postings on virtual all_matching
>> doc with average stats like tf and field length and kicks scorers in? It
>> reminds me something about probabilistic retrieval, but not much. Is there
>> anything like this already?
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Reply via email to