Re: Boosting for MoreLikeThisHandler

2008-06-06 Thread Grant Ingersoll

I think a separate issue is warranted.  Can you add unit tests, too?

On Jun 5, 2008, at 3:33 PM, Tom Morton wrote:


Hi,
   SOLR-295 mentions boost support for morelikethis and then seems  
to have been subsumed by SOLR-281 but I don't think this got  
implemented.  I've patched MoreLikeThisHandler to support this.   
Here's a summary of the approach:


  Parse out mlt.qf parameters to get boosts in dismax like format  
(existing code from DisMax param parse code used to produce a  
MapString,Float)
  Iterate through mltquery terms, get boost by looking at field from  
which mltquery term came,  and multiply boost specified in map by  
existing term boost.
If mlt.boost=false, then you get the same boost values as in map/ 
mlt.qf parameters,
If mlt.boost=true then you get normalized boost multiplied by  
specified boost (which makes sense to me).


Patch attached.

Should I re-open either SOLR-281 or SOLR-295 (if I can) or create a  
new jira ticket for this?


Thanks...Tom
morelikethis.patch





Re: Boosting for MoreLikeThisHandler

2008-06-06 Thread Tom Morton
   I'm trying to add some unit tests now.  MoreLikeThisHandler had a minimal
set of tests which don't actually test its output, just that fields and a
query parameters are required.  It'll take me a bit to write something for
the regular version and then the version with specified boosts.

On Fri, Jun 6, 2008 at 6:46 AM, Grant Ingersoll [EMAIL PROTECTED] wrote:

 I think a separate issue is warranted.  Can you add unit tests, too?


 On Jun 5, 2008, at 3:33 PM, Tom Morton wrote:

  Hi,
   SOLR-295 mentions boost support for morelikethis and then seems to have
 been subsumed by SOLR-281 but I don't think this got implemented.  I've
 patched MoreLikeThisHandler to support this.  Here's a summary of the
 approach:

  Parse out mlt.qf parameters to get boosts in dismax like format
 (existing code from DisMax param parse code used to produce a
 MapString,Float)
  Iterate through mltquery terms, get boost by looking at field from which
 mltquery term came,  and multiply boost specified in map by existing term
 boost.
 If mlt.boost=false, then you get the same boost values as in map/mlt.qf
 parameters,
 If mlt.boost=true then you get normalized boost multiplied by specified
 boost (which makes sense to me).

 Patch attached.

 Should I re-open either SOLR-281 or SOLR-295 (if I can) or create a new
 jira ticket for this?

 Thanks...Tom
 morelikethis.patch






Boosting for MoreLikeThisHandler

2008-06-05 Thread Tom Morton
Hi,
   SOLR-295 https://issues.apache.org/jira/browse/SOLR-295 mentions boost
support for morelikethis and then seems to have been subsumed by
SOLR-281https://issues.apache.org/jira/browse/SOLR-281but I don't
think this got implemented.  I've patched MoreLikeThisHandler to
support this.  Here's a summary of the approach:


   -   Parse out mlt.qf parameters to get boosts in dismax like format
   (existing code from DisMax param parse code used to produce a
   MapString,Float)
   -   Iterate through mltquery terms, get boost by looking at field from
   which mltquery term came,  and multiply boost specified in map by existing
   term boost.
  - If mlt.boost=false, then you get the same boost values as in
  map/mlt.qf parameters,
   - If mlt.boost=true then you get normalized boost multiplied by specified
  boost (which makes sense to me).


Patch attached.

Should I re-open either SOLR-281 or SOLR-295 (if I can) or create a new jira
ticket for this?

Thanks...Tom
Index: src/java/org/apache/solr/common/params/MoreLikeThisParams.java
===
--- src/java/org/apache/solr/common/params/MoreLikeThisParams.java	(revision 663326)
+++ src/java/org/apache/solr/common/params/MoreLikeThisParams.java	(working copy)
@@ -35,6 +35,7 @@
   public final static String MAX_QUERY_TERMS   = PREFIX + maxqt;
   public final static String MAX_NUM_TOKENS_PARSED = PREFIX + maxntp;
   public final static String BOOST = PREFIX + boost; // boost or not?
+  public final static String QF= PREFIX + qf; //boosting applied to nlt fields
 
   // the /mlt request handler uses 'rows'
   public final static String DOC_COUNT = PREFIX + count;
Index: src/java/org/apache/solr/handler/MoreLikeThisHandler.java
===
--- src/java/org/apache/solr/handler/MoreLikeThisHandler.java	(revision 663326)
+++ src/java/org/apache/solr/handler/MoreLikeThisHandler.java	(working copy)
@@ -23,8 +23,11 @@
 import java.net.URL;
 import java.util.ArrayList;
 import java.util.Comparator;
+import java.util.HashSet;
 import java.util.Iterator;
 import java.util.List;
+import java.util.Map;
+import java.util.Set;
 import java.util.regex.Pattern;
 
 import org.apache.lucene.document.Document;
@@ -37,6 +40,7 @@
 import org.apache.lucene.search.similar.MoreLikeThis;
 import org.apache.solr.common.SolrException;
 import org.apache.solr.common.params.CommonParams;
+import org.apache.solr.common.params.DisMaxParams;
 import org.apache.solr.common.params.FacetParams;
 import org.apache.solr.common.params.MoreLikeThisParams;
 import org.apache.solr.common.params.SolrParams;
@@ -51,7 +55,6 @@
 import org.apache.solr.schema.IndexSchema;
 import org.apache.solr.schema.SchemaField;
 import org.apache.solr.search.DocIterator;
-import org.apache.solr.search.DocSet;
 import org.apache.solr.search.DocList;
 import org.apache.solr.search.DocListAndSet;
 import org.apache.solr.search.QueryParsing;
@@ -231,6 +234,7 @@
 final IndexReader reader;
 final SchemaField uniqueKeyField;
 final boolean needDocSet;
+MapString,Float boostFields;
 
 Query mltquery;  // expose this for debugging
 
@@ -260,12 +264,25 @@
   mlt.setMaxQueryTerms( params.getInt(MoreLikeThisParams.MAX_QUERY_TERMS,   MoreLikeThis.DEFAULT_MAX_QUERY_TERMS));
   mlt.setMaxNumTokensParsed(params.getInt(MoreLikeThisParams.MAX_NUM_TOKENS_PARSED, MoreLikeThis.DEFAULT_MAX_NUM_TOKENS_PARSED));
   mlt.setBoost(params.getBool(MoreLikeThisParams.BOOST, false ) );
+  boostFields = SolrPluginUtils.parseFieldBoosts(params.getParams(MoreLikeThisParams.QF));
 }
 
+private void setBoosts(Query mltquery) {
+  List clauses = ((BooleanQuery)mltquery).clauses();
+  for( Object o : clauses ) {
+TermQuery q = (TermQuery)((BooleanClause)o).getQuery();
+Float b = this.boostFields.get(q.getTerm().field());
+if (b != null) {
+  q.setBoost(b*q.getBoost());
+}
+  }
+}
+
 public DocListAndSet getMoreLikeThis( int id, int start, int rows, ListQuery filters, ListInterestingTerm terms, int flags ) throws IOException
 {
   Document doc = reader.document(id);
   mltquery = mlt.like(id);
+  setBoosts(mltquery);
   if( terms != null ) {
 fillInterestingTermsFromMLTQuery( mltquery, terms );
   }
@@ -289,6 +306,7 @@
 public DocListAndSet getMoreLikeThis( Reader reader, int start, int rows, ListQuery filters, ListInterestingTerm terms, int flags ) throws IOException
 {
   mltquery = mlt.like(reader);
+  setBoosts(mltquery);
   if( terms != null ) {
 fillInterestingTermsFromMLTQuery( mltquery, terms );
   }