[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField
[ https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610648#comment-13610648 ] Commit Tag Bot commented on SOLR-3981: -- [branch_4x commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1401920 SOLR-3988: Fixed SolrTestCaseJ4.adoc(SolrInputDocument) to respect field and document boosts SOLR-3981: Fixed bug that resulted in document boosts being compounded in destination fields (merge r41401916) > docBoost is compounded on copyField > --- > > Key: SOLR-3981 > URL: https://issues.apache.org/jira/browse/SOLR-3981 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 4.1, 5.0 > > Attachments: SOLR-3981.patch, SOLR-3981.patch, SOLR-3981.patch > > > As noted by Toke in a comment on SOLR-3875... > https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233 > {quote} > While boosting of multi-value fields is handled correctly in Solr 4.0.0, > boosting for copyFields are not. A sample document: > {code} > > Insane score Example. Score = 10E9 > Document boost broken for copyFields > video ThomasEgense and Toke Eskildsen > Test > bug > something else > bug > bug > > {code} > The fields name, manu, cat, features, keywords and content gets copied to > text and a search for thomasegense matches the text-field with query > explanation > {code} > 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result > of: > 70384.67 = fieldWeight in 0, product of: > 1.0 = tf(freq=1.0), with freq of: > 1.0 = termFreq=1.0 > 0.30685282 = idf(docFreq=1, maxDocs=1) > 229376.0 = fieldNorm(doc=0) > {code} > If the two last fields keywords and content are removed from the sample > document, the score is reduced by a factor 100 (docBoost^2). > {quote} > (This is a continuation of some of the problems caused by the changes made > when the concept of docBoost was eliminated from the underly IndexWRiter > code, and overlooked due to the lack of testing of docBoosts at the solr > level - SOLR-3885)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField
[ https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483561#comment-13483561 ] Hoss Man commented on SOLR-3981: tests & precommit look good ... unless anyone spots any problems i'll commit later today. > docBoost is compounded on copyField > --- > > Key: SOLR-3981 > URL: https://issues.apache.org/jira/browse/SOLR-3981 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 4.1 > > Attachments: SOLR-3981.patch, SOLR-3981.patch, SOLR-3981.patch > > > As noted by Toke in a comment on SOLR-3875... > https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233 > {quote} > While boosting of multi-value fields is handled correctly in Solr 4.0.0, > boosting for copyFields are not. A sample document: > {code} > > Insane score Example. Score = 10E9 > Document boost broken for copyFields > video ThomasEgense and Toke Eskildsen > Test > bug > something else > bug > bug > > {code} > The fields name, manu, cat, features, keywords and content gets copied to > text and a search for thomasegense matches the text-field with query > explanation > {code} > 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result > of: > 70384.67 = fieldWeight in 0, product of: > 1.0 = tf(freq=1.0), with freq of: > 1.0 = termFreq=1.0 > 0.30685282 = idf(docFreq=1, maxDocs=1) > 229376.0 = fieldNorm(doc=0) > {code} > If the two last fields keywords and content are removed from the sample > document, the score is reduced by a factor 100 (docBoost^2). > {quote} > (This is a continuation of some of the problems caused by the changes made > when the concept of docBoost was eliminated from the underly IndexWRiter > code, and overlooked due to the lack of testing of docBoosts at the solr > level - SOLR-3885)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField
[ https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483508#comment-13483508 ] Hoss Man commented on SOLR-3981: bq. that adoc() you are using doesnt work with boosts. (I found this from another test) Grr... thanks rmuir, never would have even thought to check that ... easy fix. bq. Applying the boosts once from all source fields for a given copyField destination seems a bit strange to me, but since it is old behaviour, I understand that it cannot be changed. right ... copyField has always copied the _field_ boosts, the bug here is the compounded docBoost. FWIW: we could add a ton more options to copyField to give more fine grained control over stuff like this as feature improvements if you'd like to file some Jiras for feature impreovements along those lines -- but personally i think: a) update processors make more sense for stuff like this; b) people to move away from doc/field boosts and start doing more with functions on numeric fields (and ultimately DocValues fields) where you have a lot more control of this stuff > docBoost is compounded on copyField > --- > > Key: SOLR-3981 > URL: https://issues.apache.org/jira/browse/SOLR-3981 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 4.1 > > Attachments: SOLR-3981.patch, SOLR-3981.patch > > > As noted by Toke in a comment on SOLR-3875... > https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233 > {quote} > While boosting of multi-value fields is handled correctly in Solr 4.0.0, > boosting for copyFields are not. A sample document: > {code} > > Insane score Example. Score = 10E9 > Document boost broken for copyFields > video ThomasEgense and Toke Eskildsen > Test > bug > something else > bug > bug > > {code} > The fields name, manu, cat, features, keywords and content gets copied to > text and a search for thomasegense matches the text-field with query > explanation > {code} > 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result > of: > 70384.67 = fieldWeight in 0, product of: > 1.0 = tf(freq=1.0), with freq of: > 1.0 = termFreq=1.0 > 0.30685282 = idf(docFreq=1, maxDocs=1) > 229376.0 = fieldNorm(doc=0) > {code} > If the two last fields keywords and content are removed from the sample > document, the score is reduced by a factor 100 (docBoost^2). > {quote} > (This is a continuation of some of the problems caused by the changes made > when the concept of docBoost was eliminated from the underly IndexWRiter > code, and overlooked due to the lack of testing of docBoosts at the solr > level - SOLR-3885)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField
[ https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483047#comment-13483047 ] Toke Eskildsen commented on SOLR-3981: -- Thank you for investigating this so quickly, Hoss. Applying the boosts once from all source fields for a given copyField destination seems a bit strange to me, but since it is old behaviour, I understand that it cannot be changed. > docBoost is compounded on copyField > --- > > Key: SOLR-3981 > URL: https://issues.apache.org/jira/browse/SOLR-3981 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 4.1 > > Attachments: SOLR-3981.patch, SOLR-3981.patch > > > As noted by Toke in a comment on SOLR-3875... > https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233 > {quote} > While boosting of multi-value fields is handled correctly in Solr 4.0.0, > boosting for copyFields are not. A sample document: > {code} > > Insane score Example. Score = 10E9 > Document boost broken for copyFields > video ThomasEgense and Toke Eskildsen > Test > bug > something else > bug > bug > > {code} > The fields name, manu, cat, features, keywords and content gets copied to > text and a search for thomasegense matches the text-field with query > explanation > {code} > 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result > of: > 70384.67 = fieldWeight in 0, product of: > 1.0 = tf(freq=1.0), with freq of: > 1.0 = termFreq=1.0 > 0.30685282 = idf(docFreq=1, maxDocs=1) > 229376.0 = fieldNorm(doc=0) > {code} > If the two last fields keywords and content are removed from the sample > document, the score is reduced by a factor 100 (docBoost^2). > {quote} > (This is a continuation of some of the problems caused by the changes made > when the concept of docBoost was eliminated from the underly IndexWRiter > code, and overlooked due to the lack of testing of docBoosts at the solr > level - SOLR-3885)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField
[ https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482915#comment-13482915 ] Robert Muir commented on SOLR-3981: --- that adoc() you are using doesnt work with boosts. (I found this from another test) > docBoost is compounded on copyField > --- > > Key: SOLR-3981 > URL: https://issues.apache.org/jira/browse/SOLR-3981 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 4.1 > > Attachments: SOLR-3981.patch, SOLR-3981.patch > > > As noted by Toke in a comment on SOLR-3875... > https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233 > {quote} > While boosting of multi-value fields is handled correctly in Solr 4.0.0, > boosting for copyFields are not. A sample document: > {code} > > Insane score Example. Score = 10E9 > Document boost broken for copyFields > video ThomasEgense and Toke Eskildsen > Test > bug > something else > bug > bug > > {code} > The fields name, manu, cat, features, keywords and content gets copied to > text and a search for thomasegense matches the text-field with query > explanation > {code} > 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result > of: > 70384.67 = fieldWeight in 0, product of: > 1.0 = tf(freq=1.0), with freq of: > 1.0 = termFreq=1.0 > 0.30685282 = idf(docFreq=1, maxDocs=1) > 229376.0 = fieldNorm(doc=0) > {code} > If the two last fields keywords and content are removed from the sample > document, the score is reduced by a factor 100 (docBoost^2). > {quote} > (This is a continuation of some of the problems caused by the changes made > when the concept of docBoost was eliminated from the underly IndexWRiter > code, and overlooked due to the lack of testing of docBoosts at the solr > level - SOLR-3885)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField
[ https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482765#comment-13482765 ] Hoss Man commented on SOLR-3981: Toke suggested in SOLR-3875... {quote} One solution would be to keep track of used fields (directly specified as well as copyFields) and only assign the full boost once per document. If the number of unique fields/document is low, a simple list would probably be the fastest and with low GC impact. For a higher number of unique fields, a Set might be better. An optimization would be to only create the tracking structure once a boost != 1.0f is encountered and only store the fields with boost != 1.0f, so that an update without boosts would not get a performance penalty. {quote} I _was_ thinking that a more straight forward solution would be to build up the entire "Document" w/o any regard to the docBoost, and then only at the end loop over the fields in that Document and multiple the docBoost if it's indexed & !omitNorms -- but then i realized that at that level there is no general way to "set" the boost. I'm working on a patch with a test demonstrating the problem ... that may help inform an appropriate solution. > docBoost is compounded on copyField > --- > > Key: SOLR-3981 > URL: https://issues.apache.org/jira/browse/SOLR-3981 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 4.1 > > > As noted by Toke in a comment on SOLR-3875... > https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233 > {quote} > While boosting of multi-value fields is handled correctly in Solr 4.0.0, > boosting for copyFields are not. A sample document: > {code} > > Insane score Example. Score = 10E9 > Document boost broken for copyFields > video ThomasEgense and Toke Eskildsen > Test > bug > something else > bug > bug > > {code} > The fields name, manu, cat, features, keywords and content gets copied to > text and a search for thomasegense matches the text-field with query > explanation > {code} > 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result > of: > 70384.67 = fieldWeight in 0, product of: > 1.0 = tf(freq=1.0), with freq of: > 1.0 = termFreq=1.0 > 0.30685282 = idf(docFreq=1, maxDocs=1) > 229376.0 = fieldNorm(doc=0) > {code} > If the two last fields keywords and content are removed from the sample > document, the score is reduced by a factor 100 (docBoost^2). > {quote} > (This is a continuation of some of the problems caused by the changes made > when the concept of docBoost was eliminated from the underly IndexWRiter > code, and overlooked due to the lack of testing of docBoosts at the solr > level - SOLR-3885)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org