benwtrent opened a new pull request, #11905:
URL: https://github.com/apache/lucene/pull/11905
This bug has been around since 9.1. It relates directly to the number of
nodes that are contained in the level 0 of the HNSW graph. Since level 0
contains all the nodes, this implies the following:
- In Lucene 9.1, the bug probably would have appeared once `31580641`
(Integer.MAX_VALUE/(maxConn + 1)) vectors were in a single segment
- In Lucene 9.2+, the bug appears when there are `16268814`
(Integer.MAX_VALUE/(M * 2 + 1)) or more vectors in a single segment.
The stack trace would indicate an EOF failure as Lucene attempts to `seek`
to a negative number in `ByteBufferIndexInput`.
This commit fixes the type casting and utilizes `Math.exact...` in the
number multiplication and addition. The overhead here is minimal as these
calculations are done in constructors and then used repeatably afterwards.
I put fixes in the older codecs, I don't know if that is typically done, but
if somebody has a large segment and wants to read the vectors, they could build
this jar and read them now (the bug is only on read and data layout is
unchanged)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]