[jira] Commented: (LUCENE-329) Fuzzy query scoring issues
[ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647175#action_12647175 ] Alexey Lef commented on LUCENE-329: --- I've been using DisjunctionMaxQuery for term expansion. It seems to be a much more natural fit for this kind of problem. BooleanQuery never worked for me even with disableCoord. Fuzzy query scoring issues -- Key: LUCENE-329 URL: https://issues.apache.org/jira/browse/LUCENE-329 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 1.2rc5 Environment: Operating System: All Platform: All Reporter: Mark Harwood Assignee: Lucene Developers Priority: Minor Attachments: patch.txt Queries which automatically produce multiple terms (wildcard, range, prefix, fuzzy etc)currently suffer from two problems: 1) Scores for matching documents are significantly smaller than term queries because of the volume of terms introduced (A match on query Foo~ is 0.1 whereas a match on query Foo is 1). 2) The rarer forms of expanded terms are favoured over those of more common forms because of the IDF. When using Fuzzy queries for example, rare mis- spellings typically appear in results before the more common correct spellings. I will attach a patch that corrects the issues identified above by 1) Overriding Similarity.coord to counteract the downplaying of scores introduced by expanding terms. 2) Taking the IDF factor of the most common form of expanded terms as the basis of scoring all other expanded terms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-329) Fuzzy query scoring issues
[ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647175#action_12647175 ] Alexey Lef commented on LUCENE-329: --- I've been using DisjunctionMaxQuery for term expansion. It seems to be a much more natural fit for this kind of problem. BooleanQuery never worked for me even with disableCoord. Fuzzy query scoring issues -- Key: LUCENE-329 URL: https://issues.apache.org/jira/browse/LUCENE-329 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 1.2rc5 Environment: Operating System: All Platform: All Reporter: Mark Harwood Assignee: Lucene Developers Priority: Minor Attachments: patch.txt Queries which automatically produce multiple terms (wildcard, range, prefix, fuzzy etc)currently suffer from two problems: 1) Scores for matching documents are significantly smaller than term queries because of the volume of terms introduced (A match on query Foo~ is 0.1 whereas a match on query Foo is 1). 2) The rarer forms of expanded terms are favoured over those of more common forms because of the IDF. When using Fuzzy queries for example, rare mis- spellings typically appear in results before the more common correct spellings. I will attach a patch that corrects the issues identified above by 1) Overriding Similarity.coord to counteract the downplaying of scores introduced by expanding terms. 2) Taking the IDF factor of the most common form of expanded terms as the basis of scoring all other expanded terms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-789) Custom similarity is ignored when using MultiSearcher
[ https://issues.apache.org/jira/browse/LUCENE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Lef updated LUCENE-789: -- Attachment: TestMultiSearcherSimilarity.java Attached unit test Custom similarity is ignored when using MultiSearcher - Key: LUCENE-789 URL: https://issues.apache.org/jira/browse/LUCENE-789 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.0.1 Reporter: Alexey Lef Attachments: TestMultiSearcherSimilarity.java Symptoms: I am using Searcher.setSimilarity() to provide a custom similarity that turns off tf() factor. However, somewhere along the way the custom similarity is ignored and the DefaultSimilarity is used. I am using MultiSearcher and BooleanQuery. Problem analysis: The problem seems to be in MultiSearcher.createWeight(Query) method. It creates an instance of CachedDfSource but does not set the similarity. As the result CachedDfSource provides DefaultSimilarity to queries that use it. Potential solution: Adding the following line: cacheSim.setSimilarity(getSimilarity()); after creating an instance of CacheDfSource (line 312) seems to fix the problem. However, I don't understand enough of the inner workings of this class to be absolutely sure that this is the right thing to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]