from:"Alexey Lef \(JIRA\)"

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2008-11-12 Thread Alexey Lef (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647175#action_12647175
]

Alexey Lef commented on LUCENE-329:
---

I've been using DisjunctionMaxQuery for term expansion. It seems to be a much
more natural fit for this kind of problem. BooleanQuery never worked for me
even with disableCoord.

Fuzzy query scoring issues
--

Key: LUCENE-329
URL: https://issues.apache.org/jira/browse/LUCENE-329
Project: Lucene - Java
Issue Type: Bug
Components: Search
Affects Versions: 1.2rc5
Environment: Operating System: All
Platform: All
Reporter: Mark Harwood
Assignee: Lucene Developers
Priority: Minor
Attachments: patch.txt

Queries which automatically produce multiple terms (wildcard, range, prefix,
fuzzy etc)currently suffer from two problems:
1) Scores for matching documents are significantly smaller than term queries
because of the volume of terms introduced (A match on query Foo~ is 0.1
whereas a match on query Foo is 1).
2) The rarer forms of expanded terms are favoured over those of more common
forms because of the IDF. When using Fuzzy queries for example, rare mis-
spellings typically appear in results before the more common correct
spellings.
I will attach a patch that corrects the issues identified above by
1) Overriding Similarity.coord to counteract the downplaying of scores
introduced by expanding terms.
2) Taking the IDF factor of the most common form of expanded terms as the
basis of scoring all other expanded terms.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2008-11-12 Thread Alexey Lef (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647175#action_12647175
]

Alexey Lef commented on LUCENE-329:
---

I've been using DisjunctionMaxQuery for term expansion. It seems to be a much
more natural fit for this kind of problem. BooleanQuery never worked for me
even with disableCoord.

Fuzzy query scoring issues
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-789) Custom similarity is ignored when using MultiSearcher

2007-04-05 Thread Alexey Lef (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alexey Lef updated LUCENE-789:
--

Attachment: TestMultiSearcherSimilarity.java

Attached unit test

Custom similarity is ignored when using MultiSearcher
-

Key: LUCENE-789
URL: https://issues.apache.org/jira/browse/LUCENE-789
Project: Lucene - Java
Issue Type: Bug
Components: Search
Affects Versions: 2.0.1
Reporter: Alexey Lef
Attachments: TestMultiSearcherSimilarity.java

Symptoms:
I am using Searcher.setSimilarity() to provide a custom similarity that turns
off tf() factor. However, somewhere along the way the custom similarity is
ignored and the DefaultSimilarity is used. I am using MultiSearcher and
BooleanQuery.
Problem analysis:
The problem seems to be in MultiSearcher.createWeight(Query) method. It
creates an instance of CachedDfSource but does not set the similarity. As the
result CachedDfSource provides DefaultSimilarity to queries that use it.
Potential solution:
Adding the following line:
cacheSim.setSimilarity(getSimilarity());
after creating an instance of CacheDfSource (line 312) seems to fix the
problem. However, I don't understand enough of the inner workings of this
class to be absolutely sure that this is the right thing to do.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

[jira] Updated: (LUCENE-789) Custom similarity is ignored when using MultiSearcher

3 matches

Site Navigation

Mail list logo

Footer information