mikemccand commented on pull request #2080:
URL: https://github.com/apache/lucene-solr/pull/2080#issuecomment-744596272


   I now understand @rmuir 's concern: because today we force sum of term freq 
within a single document to fit in `int` (during this `invertState.length` 
accumulation for norms), and because Lucene allows at most `Integer.MAX_VALUE` 
documents, we know that `totalTermFreq` (per term sum of term freq across all 
docs) cannot possibly overflow `long`.
   
   Hmm, but I think `sumTotalTermFreq`, which is per field sum of all 
`totalTermFreq` across all terms in that field, could overflow `long` even 
today, in and adversarial case.  And it would not be detected by Lucene...
   
   I think it is weird to rely on norms (doc length) accumulation to prevent 
overflow of `totalTermFreq` and `sumTotalTermFreq`, especially if norms are in 
fact disabled for that field?
   
   How about decoupling these two problems?  First, let's fix the aggregation 
of `totalTermFreq` and `sumTotalTermFreq` to explicitly catch any overflow 
instead of just doing the dangerous `+=` today: 
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/PushPostingsWriterBase.java#L142
 and 
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java#L915?
  I.e. switch these accumluations to `Math.addExact`.  This will explicitly 
catch `long` overflow for either of these stats.
   
   Second, let's not accumulate `invertState.length` if norms are disabled 
(@dxl360 's current PR).
   
   I think this would solve both issues well -- prevent `long` overflow, and 
allow large custom term frequencies in one document when norms are disabled.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to