[GitHub] [lucene-solr] dxl360 commented on pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled

2020-11-23 Thread GitBox


dxl360 commented on pull request #2080:
URL: https://github.com/apache/lucene-solr/pull/2080#issuecomment-732411622


   Had offline discussion with @mikemccand. Maybe we can change the type of 
`invertState.length` from `int` to `long` and keep the current check on field 
length/termFreq accumulation but safely cast the length back to `int` when 
calculating the norms. `totalTermFreq` and `sumTotalTermFreq` are both `long` 
and is not expected to be broken by `invertState.length`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dxl360 commented on pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled

2020-11-17 Thread GitBox


dxl360 commented on pull request #2080:
URL: https://github.com/apache/lucene-solr/pull/2080#issuecomment-729475817


   Original implementation accumulates `int invertState.length` (number of 
tokens) by term frequency and will overflow if the term frequency is too large. 
Can we increment `length` by 1 for each token when we use custom term 
frequencies to hold arbitrary scoring signals (norms is disabled)? In this way, 
the number of tokens is bounded by 2,147,483,647 and `long 
totalTermFreq/sumTotalTermFreq` won't overflow.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dxl360 commented on pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled

2020-11-17 Thread GitBox


dxl360 commented on pull request #2080:
URL: https://github.com/apache/lucene-solr/pull/2080#issuecomment-729332268


   > I'm concerned about this change: other things will overflow if you have 
too many term frequencies in a field. Currently frequency is bounded by 2^32-1 
within a doc, and you can only have 2^32-1 documents in the index, so stats 
like `totalTermFreq` and `sumTotalTermFreq` can't overflow. But with this 
change it would be easy to do this and break scoring, fail checkindex, etc.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org