[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504909#comment-16504909
 ] 

Alessandro Benedetti commented on LUCENE-8343:
----------------------------------------------

Hi [~jpountz],

thanks for your time, I can give you a quick explanation here:

The (positional) coefficient should be a double  0<=x<=1 calculated with 3 
possible formulas from the position of the first matching query term in the 
suggestion ( linear doesn't respect that constraint and can go negative for 
postion which are farer than 10 positions from the beginning ) :
 * *position_linear*: (1 – 0.10*position): Matches to the start will be given a 
higher score (Default)
 * *position_reciprocal*: 1/(1+position): Matches to the start will be given a 
score which decay faster than linear
 * *position_exponential_reciprocal*: 1/pow(1+position,exponent): Matches to 
the start will be given a score which decay faster than reciprocal

To answer your questions :

1) "turning weight=0 into 1" , so this is an interesting one :
You don't want all your weights to be 0 for the BlendedInfixSuggester because 
you would just flat to 0 the positional score of the suggestion, which is the 
only reason to use the Blended Infix ( if you are not interested in the 
positional score for the suggestion, you should use the parent suggester : 
AnalyzingInfixSuggester)
If you don't configure the weight field ( which is not and shouldn't be 
mandatory) all your weights go to 0s 
(org.apache.lucene.search.suggest.DocumentDictionary.DocumentInputIterator#getWeight
 ) and your BlendedInfixSuggester doesn't blend anything anymore scoring each 
suggestion a constant 0.
That was the reason to move the weight 0 to the smallest bigger value ( which 
in a long data type is 1) .
With that fix you limit the ability of a user to move certain suggestions to 0 
weight ( they can just drop them to 1 weight) , but you gain a good bug fix for 
the missing weight field scenario.

2) So the chosen of 10 was completely arbitrary to get at least 10 possible 
ranked outcomes out of the positional coefficient. 
You may end up in overflows if :  
: 
- the weight is already big enough.
You are right maybe we can apply that scaling factor only if the weight is 
small.
- the linear coefficient goes deep negative ( we can limit the coefficient 
score to a minimum of 0, which will also give Linear a behaviour similar to its 
siblings blender types)

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-8343
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8343
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 7.3.1
>            Reporter: Alessandro Benedetti
>            Priority: Major
>         Attachments: LUCENE-8343.patch, LUCENE-8343.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to