MoreLikeThis: Apply field level boosts before query terms are selected
----------------------------------------------------------------------
Key: SOLR-2304
URL: https://issues.apache.org/jira/browse/SOLR-2304
Project: Solr
Issue Type: Improvement
Components: MoreLikeThis
Affects Versions: 1.4.2
Reporter: Mike Mattozzi
Priority: Minor
Fix For: 1.4.2
MoreLikeThis provides the ability to set field level boosts to weight the
importance of fields in selecting similar documents. Currently, in trunk, these
field level boosts are applied after the query terms have been selected from
the priority queue of interesting terms in MoreLIkeThis. This can give
unexpected results when used in combination with mlt.maxqt to limit the number
of query terms. For example, if you use fields fieldA and fieldB and boost them
"fieldA^0.5 fieldB^2.0" with a maxqt parameter of 20, if the terms in fieldA
have relatively higher tf-idf scores than fieldB, only 20 fieldA terms will be
selected as the basis for the MoreLikeThis query... even if after boosting,
there are terms in fieldB with a higher overall score.
I encountered this while using document descriptive text and document tags
(comedy, action, etc) as the basis for MoreLIkeThis. I wanted to boost the tags
higher, however the less common document text terms were always selected as the
query terms while the more common tag terms were eliminated by the maxqt
parameter before their scores were boosted.
I believe the code was originally written as it was so that the bulk of the
work could be done in the MoreLikeThisHandler without modifying the
MoreLikeThis class in the lucene project. Now that the projects are merged, I
think this modification makes sense. I will be attaching a simple patch to
trunk.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]