Re: Per-Field Similarity

Marvin Humphrey Tue, 23 May 2006 16:09:14 -0700

On Tue, May 23, 2006 at 02:49:40PM -0700, Chris Hostetter wrote:
> 
> : Is it possible to have an IndexWriter apply different Similarity
> : models to different Fields?
> 
> As far as i know, the only way Similarity comes into play when using an
> IndexWriter is lengthNorm, and it is passed the fieldName so it's easy to
> make it's behavior field specific (SimilarityDelegator makes it easy to do
> this)


Ah.  KinoSearch's implementation inadvertently differs: lengthNorm takes only
the number of tokens.  It didn't occur to me that the problem had been solved
that way.

A while ago, I changed lengthNorm to use either the number of tokens or 100,
whichever was greater.  That had the intended effect, which was to downgrade 
the type of "stub" documents Lucene tends to boost.  However, it had the
undesirable effect of no longer boosting exact title matches.  

The answer is to use Lucene's default lengthNorm for title and the modified
lengthNorm for bodytext.  Wasn't sure how to do that in Lucene; now I know.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Per-Field Similarity

Reply via email to