[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2013-03-22 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610648#comment-13610648
 ] 

Commit Tag Bot commented on SOLR-3981:
--

[branch_4x commit] Chris M. Hostetter
http://svn.apache.org/viewvc?view=revision&revision=1401920

SOLR-3988: Fixed SolrTestCaseJ4.adoc(SolrInputDocument) to respect field and 
document boosts

SOLR-3981: Fixed bug that resulted in document boosts being compounded in 
 destination fields

(merge r41401916)







> docBoost is compounded on copyField
> ---
>
> Key: SOLR-3981
> URL: https://issues.apache.org/jira/browse/SOLR-3981
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-3981.patch, SOLR-3981.patch, SOLR-3981.patch
>
>
> As noted by Toke in a comment on SOLR-3875...
> https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
> {quote}
> While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
> boosting for copyFields are not. A sample document:
> {code}
> 
>   Insane score Example. Score = 10E9 
>   Document boost broken for copyFields
>   video ThomasEgense and Toke Eskildsen
>   Test
>   bug
>   something else
>   bug
>   bug
>   
> {code}
> The fields name, manu, cat, features, keywords and content gets copied to 
> text and a search for thomasegense matches the text-field with query 
> explanation
> {code}
> 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
> of:
>   70384.67 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 0.30685282 = idf(docFreq=1, maxDocs=1)
> 229376.0 = fieldNorm(doc=0)
> {code}
> If the two last fields keywords and content are removed from the sample 
> document, the score is reduced by a factor 100 (docBoost^2).
> {quote}
> (This is a continuation of some of the problems caused by the changes made 
> when the concept of docBoost was eliminated from the underly IndexWRiter 
> code, and overlooked due to the lack of testing of docBoosts at the solr 
> level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2012-10-24 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483561#comment-13483561
 ] 

Hoss Man commented on SOLR-3981:


tests & precommit look good ... unless anyone spots any problems i'll commit 
later today.

> docBoost is compounded on copyField
> ---
>
> Key: SOLR-3981
> URL: https://issues.apache.org/jira/browse/SOLR-3981
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.1
>
> Attachments: SOLR-3981.patch, SOLR-3981.patch, SOLR-3981.patch
>
>
> As noted by Toke in a comment on SOLR-3875...
> https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
> {quote}
> While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
> boosting for copyFields are not. A sample document:
> {code}
> 
>   Insane score Example. Score = 10E9 
>   Document boost broken for copyFields
>   video ThomasEgense and Toke Eskildsen
>   Test
>   bug
>   something else
>   bug
>   bug
>   
> {code}
> The fields name, manu, cat, features, keywords and content gets copied to 
> text and a search for thomasegense matches the text-field with query 
> explanation
> {code}
> 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
> of:
>   70384.67 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 0.30685282 = idf(docFreq=1, maxDocs=1)
> 229376.0 = fieldNorm(doc=0)
> {code}
> If the two last fields keywords and content are removed from the sample 
> document, the score is reduced by a factor 100 (docBoost^2).
> {quote}
> (This is a continuation of some of the problems caused by the changes made 
> when the concept of docBoost was eliminated from the underly IndexWRiter 
> code, and overlooked due to the lack of testing of docBoosts at the solr 
> level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2012-10-24 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483508#comment-13483508
 ] 

Hoss Man commented on SOLR-3981:


bq. that adoc() you are using doesnt work with boosts. (I found this from 
another test)

Grr... thanks rmuir, never would have even thought to check that ... easy fix.

bq. Applying the boosts once from all source fields for a given copyField 
destination seems a bit strange to me, but since it is old behaviour, I 
understand that it cannot be changed.

right ... copyField has always copied the _field_ boosts, the bug here is the 
compounded docBoost.

FWIW: we could add a ton more options to copyField to give more fine grained 
control over stuff like this as feature improvements if you'd like to file some 
Jiras for feature impreovements along those lines -- but personally i think: a) 
update processors make more sense for stuff like this; b) people to move away 
from doc/field boosts and start doing more with functions on numeric fields 
(and ultimately DocValues fields) where you have a lot more control of this 
stuff

> docBoost is compounded on copyField
> ---
>
> Key: SOLR-3981
> URL: https://issues.apache.org/jira/browse/SOLR-3981
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.1
>
> Attachments: SOLR-3981.patch, SOLR-3981.patch
>
>
> As noted by Toke in a comment on SOLR-3875...
> https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
> {quote}
> While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
> boosting for copyFields are not. A sample document:
> {code}
> 
>   Insane score Example. Score = 10E9 
>   Document boost broken for copyFields
>   video ThomasEgense and Toke Eskildsen
>   Test
>   bug
>   something else
>   bug
>   bug
>   
> {code}
> The fields name, manu, cat, features, keywords and content gets copied to 
> text and a search for thomasegense matches the text-field with query 
> explanation
> {code}
> 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
> of:
>   70384.67 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 0.30685282 = idf(docFreq=1, maxDocs=1)
> 229376.0 = fieldNorm(doc=0)
> {code}
> If the two last fields keywords and content are removed from the sample 
> document, the score is reduced by a factor 100 (docBoost^2).
> {quote}
> (This is a continuation of some of the problems caused by the changes made 
> when the concept of docBoost was eliminated from the underly IndexWRiter 
> code, and overlooked due to the lack of testing of docBoosts at the solr 
> level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2012-10-24 Thread Toke Eskildsen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483047#comment-13483047
 ] 

Toke Eskildsen commented on SOLR-3981:
--

Thank you for investigating this so quickly, Hoss.

Applying the boosts once from all source fields for a given copyField 
destination seems a bit strange to me, but since it is old behaviour, I 
understand that it cannot be changed.

> docBoost is compounded on copyField
> ---
>
> Key: SOLR-3981
> URL: https://issues.apache.org/jira/browse/SOLR-3981
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.1
>
> Attachments: SOLR-3981.patch, SOLR-3981.patch
>
>
> As noted by Toke in a comment on SOLR-3875...
> https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
> {quote}
> While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
> boosting for copyFields are not. A sample document:
> {code}
> 
>   Insane score Example. Score = 10E9 
>   Document boost broken for copyFields
>   video ThomasEgense and Toke Eskildsen
>   Test
>   bug
>   something else
>   bug
>   bug
>   
> {code}
> The fields name, manu, cat, features, keywords and content gets copied to 
> text and a search for thomasegense matches the text-field with query 
> explanation
> {code}
> 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
> of:
>   70384.67 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 0.30685282 = idf(docFreq=1, maxDocs=1)
> 229376.0 = fieldNorm(doc=0)
> {code}
> If the two last fields keywords and content are removed from the sample 
> document, the score is reduced by a factor 100 (docBoost^2).
> {quote}
> (This is a continuation of some of the problems caused by the changes made 
> when the concept of docBoost was eliminated from the underly IndexWRiter 
> code, and overlooked due to the lack of testing of docBoosts at the solr 
> level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2012-10-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482915#comment-13482915
 ] 

Robert Muir commented on SOLR-3981:
---

that adoc() you are using doesnt work with boosts. (I found this from another 
test)


> docBoost is compounded on copyField
> ---
>
> Key: SOLR-3981
> URL: https://issues.apache.org/jira/browse/SOLR-3981
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.1
>
> Attachments: SOLR-3981.patch, SOLR-3981.patch
>
>
> As noted by Toke in a comment on SOLR-3875...
> https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
> {quote}
> While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
> boosting for copyFields are not. A sample document:
> {code}
> 
>   Insane score Example. Score = 10E9 
>   Document boost broken for copyFields
>   video ThomasEgense and Toke Eskildsen
>   Test
>   bug
>   something else
>   bug
>   bug
>   
> {code}
> The fields name, manu, cat, features, keywords and content gets copied to 
> text and a search for thomasegense matches the text-field with query 
> explanation
> {code}
> 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
> of:
>   70384.67 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 0.30685282 = idf(docFreq=1, maxDocs=1)
> 229376.0 = fieldNorm(doc=0)
> {code}
> If the two last fields keywords and content are removed from the sample 
> document, the score is reduced by a factor 100 (docBoost^2).
> {quote}
> (This is a continuation of some of the problems caused by the changes made 
> when the concept of docBoost was eliminated from the underly IndexWRiter 
> code, and overlooked due to the lack of testing of docBoosts at the solr 
> level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2012-10-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482765#comment-13482765
 ] 

Hoss Man commented on SOLR-3981:


Toke suggested in SOLR-3875...

{quote}
One solution would be to keep track of used fields (directly specified as well 
as copyFields) and only assign the full boost once per document. If the number 
of unique fields/document is low, a simple list would probably be the fastest 
and with low GC impact. For a higher number of unique fields, a Set might be 
better. An optimization would be to only create the tracking structure once a 
boost != 1.0f is encountered and only store the fields with boost != 1.0f, so 
that an update without boosts would not get a performance penalty.
{quote}

I _was_ thinking that a more straight forward solution would be to build up the 
entire "Document" w/o any regard to the docBoost, and then only at the end loop 
over the fields in that Document and multiple the docBoost if it's indexed & 
!omitNorms -- but then i realized that at that level there is no general way to 
"set" the boost.

I'm working on a patch with a test demonstrating the problem ... that may help 
inform an appropriate solution.

> docBoost is compounded on copyField
> ---
>
> Key: SOLR-3981
> URL: https://issues.apache.org/jira/browse/SOLR-3981
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.1
>
>
> As noted by Toke in a comment on SOLR-3875...
> https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
> {quote}
> While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
> boosting for copyFields are not. A sample document:
> {code}
> 
>   Insane score Example. Score = 10E9 
>   Document boost broken for copyFields
>   video ThomasEgense and Toke Eskildsen
>   Test
>   bug
>   something else
>   bug
>   bug
>   
> {code}
> The fields name, manu, cat, features, keywords and content gets copied to 
> text and a search for thomasegense matches the text-field with query 
> explanation
> {code}
> 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
> of:
>   70384.67 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 0.30685282 = idf(docFreq=1, maxDocs=1)
> 229376.0 = fieldNorm(doc=0)
> {code}
> If the two last fields keywords and content are removed from the sample 
> document, the score is reduced by a factor 100 (docBoost^2).
> {quote}
> (This is a continuation of some of the problems caused by the changes made 
> when the concept of docBoost was eliminated from the underly IndexWRiter 
> code, and overlooked due to the lack of testing of docBoosts at the solr 
> level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org