[
https://issues.apache.org/jira/browse/LUCENE-124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler reassigned LUCENE-124:
------------------------------------
Assignee: (was: Lucene Developers)
> Fuzzy Searches do not get a boost of 0.2 as stated in "Query Syntax" doc
> ------------------------------------------------------------------------
>
> Key: LUCENE-124
> URL: https://issues.apache.org/jira/browse/LUCENE-124
> Project: Lucene - Java
> Issue Type: Bug
> Components: Search
> Affects Versions: 1.2
> Environment: Operating System: All
> Platform: All
> Reporter: Cormac Twomey
> Priority: Minor
>
> According to the website's "Query Syntax" page, fuzzy searches are given a
> boost of 0.2. I've found this not to be the case, and have seen situations
> where
> exact matches have lower relevance scores than fuzzy matches.
> Rather than getting a boost of 0.2, it appears that all variations on the term
> are first found in the model, where dist* > 0.5.
> * dist = levenshteinDistance / length of min(termlength, variantlength)
> This then leads to a boolean OR search of all the variant terms, each of whose
> boost is set to (dist - 0.5)*2 for that variant.
> The upshot of all of this is that there are many cases where a fuzzy match
> will
> get a higher relevance score than an exact match.
> See this email for a test case to reproduce this anomalous behaviour.
> http://www.mail-archive.com/[email protected]/msg02819.html
> Here is a candidate patch to address the issue -
> *** lucene-1.2\src\java\org\apache\lucene\search\FuzzyTermEnum.java Sun Jun
> 09
> 13:47:54 2002
> --- lucene-1.2-modified\src\java\org\apache\lucene\search\FuzzyTermEnum.java
> Fri
> Mar 14 11:37:20 2003
> ***************
> *** 99,105 ****
> }
>
> final protected float difference() {
> ! return (float)((distance - FUZZY_THRESHOLD) * SCALE_FACTOR);
> }
>
> final public boolean endEnum() {
> --- 99,109 ----
> }
>
> final protected float difference() {
> ! if (distance == 1.0) {
> ! return 1.0f;
> ! }
> ! else
> ! return (float)((distance - FUZZY_THRESHOLD) *
> SCALE_FACTOR);
> }
>
> final public boolean endEnum() {
> ***************
> *** 111,117 ****
> ******************************/
>
> public static final double FUZZY_THRESHOLD = 0.5;
> ! public static final double SCALE_FACTOR = 1.0f / (1.0f -
> FUZZY_THRESHOLD);
>
> /**
> Finds and returns the smallest of three integers
> --- 115,121 ----
> ******************************/
>
> public static final double FUZZY_THRESHOLD = 0.5;
> ! public static final double SCALE_FACTOR = 0.2f * (1.0f / (1.0f -
> FUZZY_THRESHOLD));
>
> /**
> Finds and returns the smallest of three integers
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]