Re: document boost

Mark Miller Wed, 30 Jan 2008 12:18:45 -0800

If you look at DocumentsWriter at line 715 you will see the docBoost getset to the docBoost you specified. At 1376 you will see boost getassigned docBoost. Then at 1509 you see how the doc boost is multipliedby the field boost: * boost *= field.getBoost();

So now you have the default field boost of 1 * the docBoost you passed.Still just the docBoost. Then at 690 you see where the norm is saved inthe index: float norm = fp.boost *writer.getSimilarity().lengthNorm(fp.fieldInfo.name, fp.length);

As you can see, its just the boost * the lengthNorm. Still no way to beexponential.

Now the norm is read from different scorers, but for example, inTermScorer at line 126 you see: return raw *Similarity.decodeNorm(norms[doc]); // normalize for field


Again just a multiplication. No way for exponential involving that norm.

And finally, i have watched Lucene in action and observed the properincrease. For a similar setup as you describe I can put in a field boostof 60 and still get a norm of only 20. It just doesnt explode. When Idouble the boost, the field norm is doubled. Which follows from the code.


Maybe the issue is Solr?

- Mark



Mike Grafton wrote:

Thanks for your help, Mark.We can start by posting our SOLR config files, although I'm not sureif that will be helpful (we don't see much in there regardingboosts). See attached. How SOLR actually configures and interfaceswith Lucene is a bit of an unknown to us, so I'm not sure we can getdown to the raw Lucene configuration and interaction.

That being said, in addition to the SOLR config files, see ourattached XML which we post to SOLR to add the document to the index.

How do you know that boost should affect fieldNorm linearly? Is theresome code you can point us to? We looked through the Lucene sourcefor a while, but it was kind of hard to track this down.

One note: we're on an old version of Lucene - a nightly build between2.0.0 and 2.1.0.


Mike

On 1/30/08, *Mark Miller* <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:


    I would say you def misconfigured something. Doubling your doc boost
    will double your fieldNorm approximately (I think the precision isn't
    perfect).

    I don't know what your doing wrong in such a small test, but your
    fieldNorm should *not* be exploding like that.

    Can you post some code?

    - Mark

    Mike Grafton wrote:
    > Hello folks,
    >
    > We're trying to use Lucene's scoring to do a fairly basic thing:
    give a
    > document (in this case, we index "articles") a boost based on an
    integer
    > value that we know at index-time.  We want the  document boost
    to affect the
    > final document score linearly.
    >
    > We thought that assigning a document boost based on this value
    would do the
    > trick, but the behavior we're seeing doesn't match what we
    expect given the
    > online documentation.  In fact, we see that a linear increase in
    document
    > boost yields an exponential increase in the 'fieldNorm'
    component of the
    > score for each term of the query that matched the
    document.    Here's a
    > small table of values that relate the document boost we pass in
    to the
    > fieldNorm contribution returned by Lucene:
    >
    > boost  fieldNorm
    > 1.0    0.3125    =  (5/16, 2^-1.678)
    > 2.0    20.0      =  (2^4.3219, 2^1 * 10)
    > 3.0    256.0     =  (2^8)
    > 4.0    1280.0    =  (2^10.3219, 2^7 * 10)
    > 5.0    5120.0    =  (2^12.3219, 2^9 * 10)
    > 6.0    16384.0   =  (2^14.0)
    > 7.0    40960.0   =  (2^15.3219, 2^12 * 10)
    > 8.0    81920.0   =  (2^16.3219, 2^13 * 10)
    > 10.0   327680.0  =  (2^18.3219, 2^15 * 10)
    >
    > This example is using a query with two terms against a document that
    > contains those terms and a few others, in one searchable field.
    >
    > Is this the way document boost is supposed to work?  Or have we
    > misconfigured something? If we cannot use document boost to
    affect scoring
    > linearly, is there some other technique we can use?
    >
    > By the way, we're using SOLR to access Lucene.  We can give more
    information
    > if necessary, such as our SOLR schema.xml, if folks think that
    would help
    > explain things.  Let us know what other information we can provide.
    >
    > Thanks,
    > Mike
    >
    >

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>
    For additional commands, e-mail: [EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>


------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: document boost

Reply via email to