[jira] [Comment Edited] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-07-30 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561918#comment-16561918
 ] 

Alessandro Benedetti edited comment on LUCENE-8343 at 7/30/18 1:25 PM:
---

Hi [~mikemccand], I just updated the Pull Request and patch.

I have checked ant precommit ant it seems fine to me.
 I have also executed some of the tests ( the one that are related with the 
Suggesters).
 There were some Solr tests failing. So I addressed that as well.
 let me know, happy to take care of anything that is missing.
i will monitor the Jira issue and check when the robot returns the checks and 
tests.


was (Author: alessandro.benedetti):
Hi [~mikemccand], I just updated the Pull Request and patch.

I have checked ant precommit ant it seems fine to me.
I have also executed some of the tests ( the one that are related with the 
Suggesters).
There were some Solr tests failing. So I addressed that as well.
let me know, happy to take care of anything that is missing.

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch, 
> LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-07 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504909#comment-16504909
 ] 

Alessandro Benedetti edited comment on LUCENE-8343 at 6/7/18 5:42 PM:
--

Hi [~jpountz],

thanks for your time, I can give you a quick explanation here:

The (positional) coefficient should be a double  0<=x<=1 calculated with 3 
possible formulas from the position of the first matching query term in the 
suggestion ( linear doesn't respect that constraint and can go negative for 
postion which are farer than 10 positions from the beginning ) :
 * *position_linear*: (1 – 0.10*position): Matches to the start will be given a 
higher score (Default)
 * *position_reciprocal*: 1/(1+position): Matches to the start will be given a 
score which decay faster than linear
 * *position_exponential_reciprocal*: 1/pow(1+position,exponent): Matches to 
the start will be given a score which decay faster than reciprocal

To answer your questions :

1) "turning weight=0 into 1" , so this is an interesting one :
 You don't want all your weights to be 0 for the BlendedInfixSuggester because 
you would just flat to 0 the positional score of the suggestion, which is the 
only reason to use the Blended Infix ( if you are not interested in the 
positional score for the suggestion, you should use the parent suggester : 
AnalyzingInfixSuggester)
 If you don't configure the weight field ( which is not and shouldn't be 
mandatory) all your weights go to 0s 
(org.apache.lucene.search.suggest.DocumentDictionary.DocumentInputIterator#getWeight
 ) and your BlendedInfixSuggester doesn't blend anything anymore scoring each 
suggestion a constant 0.
 That was the reason to move the weight 0 to the smallest bigger value ( which 
in a long data type is 1) .
 With that fix you limit the ability of a user to move certain suggestions to 0 
weight ( they can just drop them to 1 weight) , but you gain a good bug fix for 
the missing weight field scenario.

2) So the chosen of 10 was completely arbitrary to get at least 10 possible 
ranked outcomes out of the positional coefficient. 
 You may end up in overflows if :  


 - the weight is already big enough.
 You are right maybe we can apply that scaling factor only if the weight is 
small.


The overflow according to my analysis can not come from the coefficient, 
because the edge cases for linear are :
1 - where input position is 0
-2.147483637002E8  -  where input position is 
[Integer.MAX_VALUE|http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Integer.html#MAX_VALUE]
 ( which is not going to be achievable as Strings full length are maxed by that 
value)


was (Author: alessandro.benedetti):
Hi [~jpountz],

thanks for your time, I can give you a quick explanation here:

The (positional) coefficient should be a double  0<=x<=1 calculated with 3 
possible formulas from the position of the first matching query term in the 
suggestion ( linear doesn't respect that constraint and can go negative for 
postion which are farer than 10 positions from the beginning ) :
 * *position_linear*: (1 – 0.10*position): Matches to the start will be given a 
higher score (Default)
 * *position_reciprocal*: 1/(1+position): Matches to the start will be given a 
score which decay faster than linear
 * *position_exponential_reciprocal*: 1/pow(1+position,exponent): Matches to 
the start will be given a score which decay faster than reciprocal

To answer your questions :

1) "turning weight=0 into 1" , so this is an interesting one :
You don't want all your weights to be 0 for the BlendedInfixSuggester because 
you would just flat to 0 the positional score of the suggestion, which is the 
only reason to use the Blended Infix ( if you are not interested in the 
positional score for the suggestion, you should use the parent suggester : 
AnalyzingInfixSuggester)
If you don't configure the weight field ( which is not and shouldn't be 
mandatory) all your weights go to 0s 
(org.apache.lucene.search.suggest.DocumentDictionary.DocumentInputIterator#getWeight
 ) and your BlendedInfixSuggester doesn't blend anything anymore scoring each 
suggestion a constant 0.
That was the reason to move the weight 0 to the smallest bigger value ( which 
in a long data type is 1) .
With that fix you limit the ability of a user to move certain suggestions to 0 
weight ( they can just drop them to 1 weight) , but you gain a good bug fix for 
the missing weight field scenario.

2) So the chosen of 10 was completely arbitrary to get at least 10 possible 
ranked outcomes out of the positional coefficient. 
You may end up in overflows if :  
: 
- the weight is already big enough.
You are right maybe we can apply that scaling factor only if the weight is 
small.
- the linear coefficient goes deep negative ( we can limit the coefficient 
score to a minimum of 0, which will also give Linear a behaviour similar

[jira] [Comment Edited] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-07 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504852#comment-16504852
 ] 

Alessandro Benedetti edited comment on LUCENE-8343 at 6/7/18 4:00 PM:
--

Hi [~ctargett],

I did the change and pushed, so they were just in the Jira associated Pull 
Request :
 [GitHub Pull Request #391
|https://github.com/apache/lucene-solr/pull/391]I just uploaded the patch as 
well.
 
You can take a look now ( I think the Github Pull Request is easier to read, 
but feel free to use the patch at your convenience)|


was (Author: alessandro.benedetti):
Hi Cassandra,

I did the change and pushed, so they were just in the Jira associated Pull 
Request :
[GitHub Pull Request #391

|https://github.com/apache/lucene-solr/pull/391]I just uploaded the patch as 
well.
You can take a look now ( I think the Github Pull Request is easier to read, 
but feel free to use the patch at your convenience)

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org