Could it be that the tokenization schema for URL have changed between the times you added documents? I.e. yielding more tokens when you got the low fieldNorm value. Number of documents should not impact the fieldnorm, the value is based on number of tokens in the field, field and document boost:

document boost * field boost * (1/sqrt(terms in field))

Or perhaps you were using different similarity classes where lengthNorm(String,int) differed?

If none of above I'm clueless.


       karl

5 okt 2009 kl. 12.45 skrev Ole-Martin Mørk:

I did not change the url. The length of the title was increased by 1, from
41 to 42 characters.
--
Ole-Martin Mørk


On Mon, Oct 5, 2009 at 12:39 PM, Karl Wettin <karl.wet...@gmail.com> wrote:

sorry, I ment title.

5 okt 2009 kl. 11.57 skrev Simon Willnauer:


Ole-Martin, did you mention that you did not change the URL value but the
title?

simon

On Mon, Oct 5, 2009 at 11:52 AM, Karl Wettin <karl.wet...@gmail.com>
wrote:

Hi Ole-Martin,

how many characters was it in the url in before and after update?


 karl

5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk:


Hi. I am trying to understand Lucene's scoring algorithm. We're

getting some strange results. First we search for a given page by it's
url. We get this result:

0.0014793393 = fieldWeight(url:"our super secret url" in 22), product
of:
1.0 = tf(phraseFreq=1.0)
32.31666 = idf(url: www=7327 host=321 com=7327 article=2456
something=2 something=44 704290075=1)
4.5776367E-5 = fieldNorm(field=url, doc=22)

When this is done, we use solrJ to read and write the document. The
only change is the title of the document (appends the number 2)

We search again and the fieldNorm is changed significantly:

9.874598 = fieldWeight(url:"our super secret url" in 0), product of:
1.0 = tf(phraseFreq=1.0)
31.598713 = idf(url: www=7328 host=322 com=7328 article=2457
something=3 somthing=45 704290075=2)
0.3125 = fieldNorm(field=url, doc=0)

Why does the value of fieldNorm change so much?

Looking forward to your answers.

--
Ole-Martin Mørk
http://twitter.com/olemartin
http://flickr.com/olemartin



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to