[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427254#comment-13427254
 ] 

Johannes Christen edited comment on LUCENE-4282 at 8/2/12 12:01 PM:
--------------------------------------------------------------------

Well. I think I found the solution.
You were right Uwe. It happens in the FuzzyTermsEnum:AutomatonFuzzyTermsEnum 
class.
Calculating the similarity in the accept() method is based on the offset of the 
smallest length of request term and index term.

I attached my ModifiedFuzzyTermEnum class, where you can find the modification 
which makes it work.
BTW. There are some more modifications, fixing bugs in calculating the 
similarity out of the edit distance and vise versa.
The modification of the boost factor was only necessary for my boolean address 
search approach and possibly doesn't apply here.
The modified bits are marked with USERCODE_BEGIN and USERCODE_END tags.



                
      was (Author: superjo):
    Well. I think I found the solution.
You were right Uwe. It happens in the FuzzyTermsEnum:AutomatonFuzzyTermsEnum 
class.
Calculating the similarity in the accept() method is based on the offset of the 
smallest length of request term and index term.

I will attach my ModifiedFuzzyTermEnum class, where you can find the 
modification which makes it work.
BTW. There are some more modifications, fixing bugs in calculating the 
similarity out of the edit distance and vise versa.
The modification of the boost factor was only necessary for my boolean address 
search approach and possibly doesn't apply here.
The modified bits are marked with USERCODE_BEGIN and USERCODE_END tags.



                  
> Automaton Fuzzy Query doesn't deliver all results
> -------------------------------------------------
>
>                 Key: LUCENE-4282
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4282
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 4.0-ALPHA
>            Reporter: Johannes Christen
>            Assignee: Robert Muir
>              Labels: newbie
>         Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to