[ https://issues.apache.org/jira/browse/SOLR-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mike Mattozzi updated SOLR-2304: -------------------------------- Attachment: SOLR-2304.patch Patch to lucene trunk to apply field level boosts before query terms are selected in MoreLikeThis. > MoreLikeThis: Apply field level boosts before query terms are selected > ---------------------------------------------------------------------- > > Key: SOLR-2304 > URL: https://issues.apache.org/jira/browse/SOLR-2304 > Project: Solr > Issue Type: Improvement > Components: MoreLikeThis > Affects Versions: 1.4.2 > Reporter: Mike Mattozzi > Priority: Minor > Fix For: 1.4.2 > > Attachments: SOLR-2304.patch > > > MoreLikeThis provides the ability to set field level boosts to weight the > importance of fields in selecting similar documents. Currently, in trunk, > these field level boosts are applied after the query terms have been selected > from the priority queue of interesting terms in MoreLIkeThis. This can give > unexpected results when used in combination with mlt.maxqt to limit the > number of query terms. For example, if you use fields fieldA and fieldB and > boost them "fieldA^0.5 fieldB^2.0" with a maxqt parameter of 20, if the terms > in fieldA have relatively higher tf-idf scores than fieldB, only 20 fieldA > terms will be selected as the basis for the MoreLikeThis query... even if > after boosting, there are terms in fieldB with a higher overall score. > I encountered this while using document descriptive text and document tags > (comedy, action, etc) as the basis for MoreLIkeThis. I wanted to boost the > tags higher, however the less common document text terms were always selected > as the query terms while the more common tag terms were eliminated by the > maxqt parameter before their scores were boosted. > I believe the code was originally written as it was so that the bulk of the > work could be done in the MoreLikeThisHandler without modifying the > MoreLikeThis class in the lucene project. Now that the projects are merged, I > think this modification makes sense. I will be attaching a simple patch to > trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org