Re: [PR] Fix undercounting of RAM used by vectors buffered in in-memory segments [lucene]

via GitHub Thu, 14 May 2026 06:28:18 -0700


iprithv commented on code in PR #15982:
URL: https://github.com/apache/lucene/pull/15982#discussion_r3197571482



##########
lucene/core/src/java/org/apache/lucene/codecs/BufferingKnnVectorsWriter.java:
##########
@@ -260,7 +260,7 @@ public final long ramBytesUsed() {
               * (long)
                   (RamUsageEstimator.NUM_BYTES_OBJECT_REF
                       + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER)
-          + vectors.size() * (long) dim * Float.BYTES;
+          + vectors.size() * (long) dim * 
fieldInfo.getVectorEncoding().byteSize;

Review Comment:
   Yes, exactly. Before the fix, ramBytesUsed() always multiplied by 
Float.BYTES (4) regardless of encoding, so a byte[] vector field was reported 
as 4x its actual memory cost causing IndexWriter to flush up to 4x too early 
for byte encoded vector fields. After this goes in, it switches to 
fieldInfo.getVectorEncoding().byteSize, which is 1 for BYTE and 4 for FLOAT32, 
giving the correct cost in both cases. Thanks @mikemccand!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix undercounting of RAM used by vectors buffered in in-memory segments [lucene]

Reply via email to