MoreLikeThis: Apply field level boosts before query terms are selected
----------------------------------------------------------------------

                 Key: SOLR-2304
                 URL: https://issues.apache.org/jira/browse/SOLR-2304
             Project: Solr
          Issue Type: Improvement
          Components: MoreLikeThis
    Affects Versions: 1.4.2
            Reporter: Mike Mattozzi
            Priority: Minor
             Fix For: 1.4.2


MoreLikeThis provides the ability to set field level boosts to weight the 
importance of fields in selecting similar documents. Currently, in trunk, these 
field level boosts are applied after the query terms have been selected from 
the priority queue of interesting terms in MoreLIkeThis. This can give 
unexpected results when used in combination with mlt.maxqt to limit the number 
of query terms. For example, if you use fields fieldA and fieldB and boost them 
"fieldA^0.5 fieldB^2.0" with a maxqt parameter of 20, if the terms in fieldA 
have relatively higher tf-idf scores than fieldB, only 20 fieldA terms will be 
selected as the basis for the MoreLikeThis query... even if after boosting, 
there are terms in fieldB with a higher overall score. 

I encountered this while using document descriptive text and document tags 
(comedy, action, etc) as the basis for MoreLIkeThis. I wanted to boost the tags 
higher, however the less common document text terms were always selected as the 
query terms while the more common tag terms were eliminated by the maxqt 
parameter before their scores were boosted. 

I believe the code was originally written as it was so that the bulk of the 
work could be done in the MoreLikeThisHandler without modifying the 
MoreLikeThis class in the lucene project. Now that the projects are merged, I 
think this modification makes sense. I will be attaching a simple patch to 
trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to