Re: Boosting Documents and score calculation

Chris Hostetter Thu, 24 Aug 2006 16:05:51 -0700

First off, when trying to make sense of socres you should allways use
either HitCollector or one of  the TopDocs methods of the Searcher
interface -- otherwise the "normalize if greater then 1" logic of the Hits
class might confuse you.


Second: Searcher.explain(Query,int) is your friend ... it will help you
understand exactly where your scores are coming from

Third: index time document boosts are folded into the "norm" value for
that field (along with any index time field boosts and the length norm)
... these norms are "encoded" as a single byte, which can result in a loss
of precision, so it wouldn't be too suprising if boosts of 1.0, 1.1,
and 1.2 all encoded as the same value.  (you can use
Similarity.decodeNorm(Similarity.encodeNorm(some_float)) to see exactly
how much precision is lost for any given float value.



: Date: Thu, 24 Aug 2006 10:06:35 -0700 (PDT)
: From: AlexeyG <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Boosting Documents and score calculation
:
:
: Hello,
:
: I ran into some very strange behavior by Lucene 1.9.  Boost factor under 1.3
: does not effect the result score!  I wrote a simple test to isolate the
: issue:
:
: Writing test index
: Creating 3 documents with same KEY and boosts of default, 1.1, 1.2, and 1.3
:
:       public static void writeTestIndex() throws IOException {
:
:               // opening index writer
:               IndexWriter writer = null;
:               writer = new IndexWriter("C:\\a_temp", new StandardAnalyzer(), 
true);
:
:               Document currentDocument = null;
:
:               // creating and adding document with DEFAULT boost
:               currentDocument = new Document();
:               currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
: Field.Index.UN_TOKENIZED));
:               currentDocument.add(new Field("BOOST_FACTOR", "1", 
Field.Store.YES,
: Field.Index.UN_TOKENIZED));
:               writer.addDocument(currentDocument);
:
:               // creating and adding document with 1.1 boost
:               currentDocument = new Document();
:               currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
: Field.Index.UN_TOKENIZED));
:               currentDocument.add( new Field("BOOST_FACTOR", "1.1", 
Field.Store.YES,
: Field.Index.UN_TOKENIZED));
:               currentDocument.setBoost((float)1.1);
:               writer.addDocument(currentDocument);
:
:               // creating and adding document with 1.2 boost
:               currentDocument = new Document();
:               currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
: Field.Index.UN_TOKENIZED));
:               currentDocument.add( new Field("BOOST_FACTOR", "1.2", 
Field.Store.YES,
: Field.Index.UN_TOKENIZED));
:               currentDocument.setBoost((float)1.2);
:               writer.addDocument(currentDocument);
:
:               // creating and adding document with 1.3 boost
:               currentDocument = new Document();
:               currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
: Field.Index.UN_TOKENIZED));
:               currentDocument.add(new Field("BOOST_FACTOR", "1.3", 
Field.Store.YES,
: Field.Index.UN_TOKENIZED));
:               currentDocument.setBoost((float)1.3);
:               writer.addDocument(currentDocument);
:
:               // optimizing and closing IndexWriter
:               writer.optimize();
:               writer.close();
:       }
:
:
: Test Search
: Searching for the KEY value, which is the same in all 4 documents
:
:       public static void testIndex() throws IOException {
:
:               // opening IndexSearcher
:               IndexSearcher searcher = null;
:               searcher = new IndexSearcher("C:\\a_temp");
:
:               // searching for KEY
:               Hits hits = searcher.search(new TermQuery(new Term("KEY", 
"AA")));
:
:               // listing documents and their BOOST_FACTOR field
:               Document doc = null;
:               if (null != hits) {
:                       logger.debug("Listing results: ");
:                       for (int i = 0; i < hits.length(); i++) {
:                               doc = hits.doc(i);
:                               logger.debug("BOOST_FACTOR field: " + 
doc.get("BOOST_FACTOR") + " Score:
: " + hits.score(i));
:                       }
:               }
:
:               // closing IndexSearcher
:               searcher.close();
:       }
:
: Output
:
: BOOST_FACTOR field: 1.3 Score: 0.9710705
: BOOST_FACTOR field: 1 Score: 0.7768564
: BOOST_FACTOR field: 1.1 Score: 0.7768564
: BOOST_FACTOR field: 1.2 Score: 0.7768564
:
: Boost of 1.1 and 1.2 did not effect score for the last 2 documents!
: Document with boost of 1.3 jumped to the top, but the rest were returned in
: the order they were added to the index.
:
: What am I missing here?  I thought document score would reflect all levels
: of boost, not just 1.3 and above?  Please help.
: --
: View this message in context: 
http://www.nabble.com/Boosting-Documents-and-score-calculation-tf2159899.html#a5968287
: Sent from the Lucene - Java Users forum at Nabble.com.
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Boosting Documents and score calculation

Reply via email to