[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2013-03-22 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13610648#comment-13610648
 ] 

Commit Tag Bot commented on SOLR-3981:
--

[branch_4x commit] Chris M. Hostetter
http://svn.apache.org/viewvc?view=revisionrevision=1401920

SOLR-3988: Fixed SolrTestCaseJ4.adoc(SolrInputDocument) to respect field and 
document boosts

SOLR-3981: Fixed bug that resulted in document boosts being compounded in 
copyField/ destination fields

(merge r41401916)







 docBoost is compounded on copyField
 ---

 Key: SOLR-3981
 URL: https://issues.apache.org/jira/browse/SOLR-3981
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.1, 5.0

 Attachments: SOLR-3981.patch, SOLR-3981.patch, SOLR-3981.patch


 As noted by Toke in a comment on SOLR-3875...
 https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
 {quote}
 While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
 boosting for copyFields are not. A sample document:
 {code}
 adddoc boost=10.0
   field name=idInsane score Example. Score = 10E9 /field
   field name=nameDocument boost broken for copyFields/field
   field name=manu video ThomasEgense and Toke Eskildsen/field
   field name=manu_id_sTest/field
   field name=catbug/field
   field name=featuressomething else/field
   field name=keywordsbug/field
   field name=contentbug/field
   /doc/add
 {code}
 The fields name, manu, cat, features, keywords and content gets copied to 
 text and a search for thomasegense matches the text-field with query 
 explanation
 {code}
 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
 of:
   70384.67 = fieldWeight in 0, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = termFreq=1.0
 0.30685282 = idf(docFreq=1, maxDocs=1)
 229376.0 = fieldNorm(doc=0)
 {code}
 If the two last fields keywords and content are removed from the sample 
 document, the score is reduced by a factor 100 (docBoost^2).
 {quote}
 (This is a continuation of some of the problems caused by the changes made 
 when the concept of docBoost was eliminated from the underly IndexWRiter 
 code, and overlooked due to the lack of testing of docBoosts at the solr 
 level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2012-10-24 Thread Toke Eskildsen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483047#comment-13483047
 ] 

Toke Eskildsen commented on SOLR-3981:
--

Thank you for investigating this so quickly, Hoss.

Applying the boosts once from all source fields for a given copyField 
destination seems a bit strange to me, but since it is old behaviour, I 
understand that it cannot be changed.

 docBoost is compounded on copyField
 ---

 Key: SOLR-3981
 URL: https://issues.apache.org/jira/browse/SOLR-3981
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.1

 Attachments: SOLR-3981.patch, SOLR-3981.patch


 As noted by Toke in a comment on SOLR-3875...
 https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
 {quote}
 While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
 boosting for copyFields are not. A sample document:
 {code}
 adddoc boost=10.0
   field name=idInsane score Example. Score = 10E9 /field
   field name=nameDocument boost broken for copyFields/field
   field name=manu video ThomasEgense and Toke Eskildsen/field
   field name=manu_id_sTest/field
   field name=catbug/field
   field name=featuressomething else/field
   field name=keywordsbug/field
   field name=contentbug/field
   /doc/add
 {code}
 The fields name, manu, cat, features, keywords and content gets copied to 
 text and a search for thomasegense matches the text-field with query 
 explanation
 {code}
 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
 of:
   70384.67 = fieldWeight in 0, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = termFreq=1.0
 0.30685282 = idf(docFreq=1, maxDocs=1)
 229376.0 = fieldNorm(doc=0)
 {code}
 If the two last fields keywords and content are removed from the sample 
 document, the score is reduced by a factor 100 (docBoost^2).
 {quote}
 (This is a continuation of some of the problems caused by the changes made 
 when the concept of docBoost was eliminated from the underly IndexWRiter 
 code, and overlooked due to the lack of testing of docBoosts at the solr 
 level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2012-10-24 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483508#comment-13483508
 ] 

Hoss Man commented on SOLR-3981:


bq. that adoc() you are using doesnt work with boosts. (I found this from 
another test)

Grr... thanks rmuir, never would have even thought to check that ... easy fix.

bq. Applying the boosts once from all source fields for a given copyField 
destination seems a bit strange to me, but since it is old behaviour, I 
understand that it cannot be changed.

right ... copyField has always copied the _field_ boosts, the bug here is the 
compounded docBoost.

FWIW: we could add a ton more options to copyField to give more fine grained 
control over stuff like this as feature improvements if you'd like to file some 
Jiras for feature impreovements along those lines -- but personally i think: a) 
update processors make more sense for stuff like this; b) people to move away 
from doc/field boosts and start doing more with functions on numeric fields 
(and ultimately DocValues fields) where you have a lot more control of this 
stuff

 docBoost is compounded on copyField
 ---

 Key: SOLR-3981
 URL: https://issues.apache.org/jira/browse/SOLR-3981
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.1

 Attachments: SOLR-3981.patch, SOLR-3981.patch


 As noted by Toke in a comment on SOLR-3875...
 https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
 {quote}
 While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
 boosting for copyFields are not. A sample document:
 {code}
 adddoc boost=10.0
   field name=idInsane score Example. Score = 10E9 /field
   field name=nameDocument boost broken for copyFields/field
   field name=manu video ThomasEgense and Toke Eskildsen/field
   field name=manu_id_sTest/field
   field name=catbug/field
   field name=featuressomething else/field
   field name=keywordsbug/field
   field name=contentbug/field
   /doc/add
 {code}
 The fields name, manu, cat, features, keywords and content gets copied to 
 text and a search for thomasegense matches the text-field with query 
 explanation
 {code}
 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
 of:
   70384.67 = fieldWeight in 0, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = termFreq=1.0
 0.30685282 = idf(docFreq=1, maxDocs=1)
 229376.0 = fieldNorm(doc=0)
 {code}
 If the two last fields keywords and content are removed from the sample 
 document, the score is reduced by a factor 100 (docBoost^2).
 {quote}
 (This is a continuation of some of the problems caused by the changes made 
 when the concept of docBoost was eliminated from the underly IndexWRiter 
 code, and overlooked due to the lack of testing of docBoosts at the solr 
 level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2012-10-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482915#comment-13482915
 ] 

Robert Muir commented on SOLR-3981:
---

that adoc() you are using doesnt work with boosts. (I found this from another 
test)


 docBoost is compounded on copyField
 ---

 Key: SOLR-3981
 URL: https://issues.apache.org/jira/browse/SOLR-3981
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.1

 Attachments: SOLR-3981.patch, SOLR-3981.patch


 As noted by Toke in a comment on SOLR-3875...
 https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
 {quote}
 While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
 boosting for copyFields are not. A sample document:
 {code}
 adddoc boost=10.0
   field name=idInsane score Example. Score = 10E9 /field
   field name=nameDocument boost broken for copyFields/field
   field name=manu video ThomasEgense and Toke Eskildsen/field
   field name=manu_id_sTest/field
   field name=catbug/field
   field name=featuressomething else/field
   field name=keywordsbug/field
   field name=contentbug/field
   /doc/add
 {code}
 The fields name, manu, cat, features, keywords and content gets copied to 
 text and a search for thomasegense matches the text-field with query 
 explanation
 {code}
 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
 of:
   70384.67 = fieldWeight in 0, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = termFreq=1.0
 0.30685282 = idf(docFreq=1, maxDocs=1)
 229376.0 = fieldNorm(doc=0)
 {code}
 If the two last fields keywords and content are removed from the sample 
 document, the score is reduced by a factor 100 (docBoost^2).
 {quote}
 (This is a continuation of some of the problems caused by the changes made 
 when the concept of docBoost was eliminated from the underly IndexWRiter 
 code, and overlooked due to the lack of testing of docBoosts at the solr 
 level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2012-10-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482765#comment-13482765
 ] 

Hoss Man commented on SOLR-3981:


Toke suggested in SOLR-3875...

{quote}
One solution would be to keep track of used fields (directly specified as well 
as copyFields) and only assign the full boost once per document. If the number 
of unique fields/document is low, a simple list would probably be the fastest 
and with low GC impact. For a higher number of unique fields, a Set might be 
better. An optimization would be to only create the tracking structure once a 
boost != 1.0f is encountered and only store the fields with boost != 1.0f, so 
that an update without boosts would not get a performance penalty.
{quote}

I _was_ thinking that a more straight forward solution would be to build up the 
entire Document w/o any regard to the docBoost, and then only at the end loop 
over the fields in that Document and multiple the docBoost if it's indexed  
!omitNorms -- but then i realized that at that level there is no general way to 
set the boost.

I'm working on a patch with a test demonstrating the problem ... that may help 
inform an appropriate solution.

 docBoost is compounded on copyField
 ---

 Key: SOLR-3981
 URL: https://issues.apache.org/jira/browse/SOLR-3981
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.1


 As noted by Toke in a comment on SOLR-3875...
 https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
 {quote}
 While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
 boosting for copyFields are not. A sample document:
 {code}
 adddoc boost=10.0
   field name=idInsane score Example. Score = 10E9 /field
   field name=nameDocument boost broken for copyFields/field
   field name=manu video ThomasEgense and Toke Eskildsen/field
   field name=manu_id_sTest/field
   field name=catbug/field
   field name=featuressomething else/field
   field name=keywordsbug/field
   field name=contentbug/field
   /doc/add
 {code}
 The fields name, manu, cat, features, keywords and content gets copied to 
 text and a search for thomasegense matches the text-field with query 
 explanation
 {code}
 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
 of:
   70384.67 = fieldWeight in 0, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = termFreq=1.0
 0.30685282 = idf(docFreq=1, maxDocs=1)
 229376.0 = fieldNorm(doc=0)
 {code}
 If the two last fields keywords and content are removed from the sample 
 document, the score is reduced by a factor 100 (docBoost^2).
 {quote}
 (This is a continuation of some of the problems caused by the changes made 
 when the concept of docBoost was eliminated from the underly IndexWRiter 
 code, and overlooked due to the lack of testing of docBoosts at the solr 
 level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org