The decimal similarity gets translated into a number of characters, based on 
the term length. so it will be 1, 2, 3, or 4, which correspond to 0.25, 0.50, 
0.75, or 1.00. Your 0.6 is getting rounded up to 0.75, which means 
three-quarters or three out of four characters must match. With 0.5, only two 
out of four characters must match.

(Note: This is not a precise description of fuzzy matching, but close enough to 
explain the issue here.)

Also, decimal similarity for fuzzy query is deprecated in favor of specifying 
the editing distance, so you should be using ~1 or ~2 – only 0, 1, and 2 are 
supported.

-- Jack Krupansky

From: Fabian Vigna 
Sent: Wednesday, November 27, 2013 9:55 AM
To: dev@lucene.apache.org 
Subject: Similarity - No Match

Hello everybody,

 

The case I have pending pertains to BANK REFAH.   If you enter BANK REFHA 
inverting the last two letters, it does not find a match with Similarity 6.  It 
does find it with similarity 5.

 

(REFHA~0.6 AND BANK~0.6)

 

My question is: Why just inverting the last 2 letters it does not find a match?

 

 

Thanks!

 

Fabian

Reply via email to