[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427254#comment-13427254 ]
Johannes Christen edited comment on LUCENE-4282 at 8/2/12 12:01 PM: -------------------------------------------------------------------- Well. I think I found the solution. You were right Uwe. It happens in the FuzzyTermsEnum:AutomatonFuzzyTermsEnum class. Calculating the similarity in the accept() method is based on the offset of the smallest length of request term and index term. I attached my ModifiedFuzzyTermEnum class, where you can find the modification which makes it work. BTW. There are some more modifications, fixing bugs in calculating the similarity out of the edit distance and vise versa. The modification of the boost factor was only necessary for my boolean address search approach and possibly doesn't apply here. The modified bits are marked with USERCODE_BEGIN and USERCODE_END tags. was (Author: superjo): Well. I think I found the solution. You were right Uwe. It happens in the FuzzyTermsEnum:AutomatonFuzzyTermsEnum class. Calculating the similarity in the accept() method is based on the offset of the smallest length of request term and index term. I will attach my ModifiedFuzzyTermEnum class, where you can find the modification which makes it work. BTW. There are some more modifications, fixing bugs in calculating the similarity out of the edit distance and vise versa. The modification of the boost factor was only necessary for my boolean address search approach and possibly doesn't apply here. The modified bits are marked with USERCODE_BEGIN and USERCODE_END tags. > Automaton Fuzzy Query doesn't deliver all results > ------------------------------------------------- > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 4.0-ALPHA > Reporter: Johannes Christen > Assignee: Robert Muir > Labels: newbie > Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org