[ 
https://issues.apache.org/jira/browse/SOLR-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462105#comment-13462105
 ] 

Mark Miller commented on SOLR-3875:
-----------------------------------

bq. patch with proposed test & fix 

+1

I applied the patch, inspected the fix, inspected the test. It looks right to 
me.

I also ran all tests, and verified the new test fails as expected without the 
fix.
                
> Document boost does not work correctly when using multi-valued fields
> ---------------------------------------------------------------------
>
>                 Key: SOLR-3875
>                 URL: https://issues.apache.org/jira/browse/SOLR-3875
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis, update
>    Affects Versions: 4.0-BETA
>            Reporter: Toke Eskildsen
>            Priority: Critical
>             Fix For: 4.0, 4.1, 5.0
>
>         Attachments: SOLR-3875.patch
>
>
> In Solr 4 BETA & trunk, document boosts skews the ranking for documents with 
> multi value fields tremendously. A document boost of 5 combined with 15 
> values in a multi value field results in scores above 1,000,000,000, while a 
> boost of 0,5 results in scores below 0,001. The error is not present in Solr 
> 3.6.
> Thomas Egense and I have tracked it down to a change in Solr DocumentBuilder 
> committed 20110827 (@1162347) by Mike McCandless, as part of work done on 
> LUCENE-2308. The problem is that Lucene multiplies the boosts of multiple 
> instances of the same field when updating the index.
> The old DocumentBuilder, used in Lucene 3.6, handled this by calculating the 
> score for the field (docBoost*fieldBoost) and assigning it to the first 
> instance of the field, then setting the boost to 1.0f and assigning that to 
> subsequent instances of the field. This effectively assigned 
> docBoost*fieldBoost to the field, regardless of the number of instances.
> The updated DocumentBuilder (see 
> https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/solr/core/src/java/org/apache/solr/update/DocumentBuilder.java?revision=1388778&view=markup),
>  used in Lucene 4 BETA & trunk, also assigns docBoost*fieldBoost to the first 
> instance of the field. Then it sets fieldBoost = docBoost and continues to 
> assign docBoost*fieldBoost to subsequent instances. Using the example 
> mentioned above, the generated IndexableFields will get assigned boosts of 5, 
> 5*5, 5*5... 5*5. As Lucene multiplies all the values, 15 instances of the 
> same field will have a collective boost of 5*25^14.
> This can be demonstrated with the Solr tutorial example by indexing the 
> sample documents and adding the document 
> {code:xml}
> <add>
> <doc boost="5">
>   <field name="id">Insane score Example. Score = 10E9 </field>
>   <field name="name">Document boost broken for multivalued fields</field>
>   <field name="manu">Thomas Egense and Toke Eskildsen</field>
>   <field name="manu_id_s">Test</field>
>   <field name="cat">bug</field>
>   <field name="features">insane_boost</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>  
> </doc>
> </add>
> {code}
> The _manu_ & _features_-fields gets copied to _text_ and a search for 
> _thomas_ matches the _text_-field with query explanation
> {code:xml}
> <str name="Insane score Example. Score = 10E10 ">
> 2.44373361E10 = (MATCH) weight(text:thomas in 0) [DefaultSimilarity], result 
> of:
>   2.44373361E10 = fieldWeight in 0, product of:
>     1.0 = tf(freq=1.0), with freq of:
>       1.0 = termFreq=1.0
>     3.2512918 = idf(docFreq=3, maxDocs=38)
>     7.5161928E9 = fieldNorm(doc=0)
> </str>
> {code}
> Thomas and I are too pressed for time to attempt a proper patch at the 
> moment, but we guess that a reversion to the old algorithm of assigning the 
> combined boost to the first instance and 1.0f to all subsequent instances 
> would work?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to