Re: RE : Re: problem undestanding the hits.score

Ion Badita Fri, 02 Nov 2007 01:11:18 -0800

For your specific problem you need to change the DefaultSimilarity onlyat index time, because the lengthNorm is written to the index when iscreated.So... first you'll need to extend the DefaultSimilarity and override thelengthNorm() method with the one suggested in the previous replay; thenset your (changed) similarity to the IndexWriter like this:


IndexWriter indexWriter = new IndexWriter(....);
indexWriter.setSimilarity(new YourSimilarity());


Add your documents... and search.


Ion


Jamal H Tandina wrote:

Thank you for your reply
How can i change the defaultSimilarity in the indexing and the searching, do 
you have an example or an url how to set the Similarity  ?
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html
Thanks againIon Badita <[EMAIL PROTECTED]> a écrit : Try too look at Similarity, there you will find thinks about thescoring. Your query is more "similar" with the shorter document.If you have 2 documents with a field body; first with words "red flower"and the second with just one word "flower", and search for the word"flower", the second document will score high because is very similarwith the query.
If you want to give priority to documents that are larger, like z1, youshould change the DefaultSimilarity (at index time), more exactly themethod:
  public float lengthNorm(String fieldName, int numTerms) {
    return (float)(1.0 / Math.sqrt(numTerms));
  }

to something like this

  public float lengthNorm(String fieldName, int numTerms) {
    return (float)(Math.sqrt(numTerms));
  }
Reindex your documents with the Similarity modified and try to searchagain. The IndexWriter has a method to set the similarity used for indexing.
I hope this will help you...


Ion



Jamal jamalator wrote:
HiI have indexed this html document=============z1========================
zo zo zo zo zo zo zo zo zo zo zo zo
zo zo zo zo zo zo zo zo zo zo zo zo
zo zo zo zo zo zo zo zo zo zo zo zo
=============z2=========================
zo zo zo zo zo zo zo zo zo zo zo zo
zo zo zo zo zo zo zo zo zo zo zo zo
=============z3==========================zo zo zo zo zo zo zo zo zo zo zo zo
=========================================
with this code

Field contentK1 = new  
Field("htmlcontent",httpd.getContentKeywords(),Field.Store.NO,Field.Index.TOKENIZED
 );
contentK1.setBoost(1/10f);  //10%
doc.add(contentK1);
and when a search "zo" with luke i have (whitespaceanalyser):
(score , id   )
(0,0957,z2 )
(0,0947,z3 )
(0,0938,z1)

NORMALY the resut expected have to be z1 z2 z3

Some One have an idea ??

Thank you all
---------------------------------
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail---------------------------------Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: RE : Re: problem undestanding the hits.score

Reply via email to