[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2008-11-12 Thread Alexey Lef (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647175#action_12647175
 ] 

Alexey Lef commented on LUCENE-329:
---

I've been using DisjunctionMaxQuery for term expansion. It seems to be a much 
more natural fit for this kind of problem. BooleanQuery never worked for me 
even with disableCoord.

 Fuzzy query scoring issues
 --

 Key: LUCENE-329
 URL: https://issues.apache.org/jira/browse/LUCENE-329
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 1.2rc5
 Environment: Operating System: All
 Platform: All
Reporter: Mark Harwood
Assignee: Lucene Developers
Priority: Minor
 Attachments: patch.txt


 Queries which automatically produce multiple terms (wildcard, range, prefix, 
 fuzzy etc)currently suffer from two problems:
 1) Scores for matching documents are significantly smaller than term queries 
 because of the volume of terms introduced (A match on query Foo~ is 0.1 
 whereas a match on query Foo is 1).
 2) The rarer forms of expanded terms are favoured over those of more common 
 forms because of the IDF. When using Fuzzy queries for example, rare mis-
 spellings typically appear in results before the more common correct 
 spellings.
 I will attach a patch that corrects the issues identified above by 
 1) Overriding Similarity.coord to counteract the downplaying of scores 
 introduced by expanding terms.
 2) Taking the IDF factor of the most common form of expanded terms as the 
 basis of scoring all other expanded terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2008-11-12 Thread Alexey Lef (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647175#action_12647175
 ] 

Alexey Lef commented on LUCENE-329:
---

I've been using DisjunctionMaxQuery for term expansion. It seems to be a much 
more natural fit for this kind of problem. BooleanQuery never worked for me 
even with disableCoord.

 Fuzzy query scoring issues
 --

 Key: LUCENE-329
 URL: https://issues.apache.org/jira/browse/LUCENE-329
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 1.2rc5
 Environment: Operating System: All
 Platform: All
Reporter: Mark Harwood
Assignee: Lucene Developers
Priority: Minor
 Attachments: patch.txt


 Queries which automatically produce multiple terms (wildcard, range, prefix, 
 fuzzy etc)currently suffer from two problems:
 1) Scores for matching documents are significantly smaller than term queries 
 because of the volume of terms introduced (A match on query Foo~ is 0.1 
 whereas a match on query Foo is 1).
 2) The rarer forms of expanded terms are favoured over those of more common 
 forms because of the IDF. When using Fuzzy queries for example, rare mis-
 spellings typically appear in results before the more common correct 
 spellings.
 I will attach a patch that corrects the issues identified above by 
 1) Overriding Similarity.coord to counteract the downplaying of scores 
 introduced by expanding terms.
 2) Taking the IDF factor of the most common form of expanded terms as the 
 basis of scoring all other expanded terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-789) Custom similarity is ignored when using MultiSearcher

2007-04-05 Thread Alexey Lef (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Lef updated LUCENE-789:
--

Attachment: TestMultiSearcherSimilarity.java

Attached unit test

 Custom similarity is ignored when using MultiSearcher
 -

 Key: LUCENE-789
 URL: https://issues.apache.org/jira/browse/LUCENE-789
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.0.1
Reporter: Alexey Lef
 Attachments: TestMultiSearcherSimilarity.java


 Symptoms:
 I am using Searcher.setSimilarity() to provide a custom similarity that turns 
 off tf() factor. However, somewhere along the way the custom similarity is 
 ignored and the DefaultSimilarity is used. I am using MultiSearcher and 
 BooleanQuery.
 Problem analysis:
 The problem seems to be in MultiSearcher.createWeight(Query) method. It 
 creates an instance of CachedDfSource but does not set the similarity. As the 
 result CachedDfSource provides DefaultSimilarity to queries that use it.
 Potential solution:
 Adding the following line:
 cacheSim.setSimilarity(getSimilarity());
 after creating an instance of CacheDfSource (line 312) seems to fix the 
 problem. However, I don't understand enough of the inner workings of this 
 class to be absolutely sure that this is the right thing to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]