Re: [PR] Fix undercounting of RAM used by vectors buffered in in-memory segments [lucene]

via GitHub Fri, 01 May 2026 08:56:03 -0700


shubhamvishu commented on code in PR #15982:
URL: https://github.com/apache/lucene/pull/15982#discussion_r3173836865



##########
lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java:
##########
@@ -819,6 +822,14 @@ ScalarQuantizer createQuantizer() throws IOException {
       return quantizer;
     }
 
+    /**
+     * Returns the RAM usage of quantization-specific state only. The 
underlying flat vector data is
+     * tracked separately by the rawVectorDelegate at the writer level.
+     */
+    long quantizationOverheadBytesUsed() {
+      return SHALLOW_SIZE;
+    }
+
     @Override
     public long ramBytesUsed() {
       long size = SHALLOW_SIZE;

Review Comment:
   Should this be removed like in Lucene102BinaryQuantizedVectorsWriter so its 
not double counted in both `#ramByesUsed` and `quantizationOverheadBytesUsed`?



##########
lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene102/Lucene102BinaryQuantizedVectorsWriter.java:
##########
@@ -714,11 +718,22 @@ public float[] copyValue(float[] vectorValue) {
       throw new UnsupportedOperationException();
     }
 
+    /**
+     * Returns the RAM usage of quantization-specific state only (magnitudes, 
dimensionSums, shallow
+     * object overhead). The underlying flat vector data is tracked separately 
by the
+     * rawVectorDelegate at the writer level to avoid double-counting.
+     */
+    long quantizationOverheadBytesUsed() {
+      long size = SHALLOW_SIZE;
+      size += magnitudes.ramBytesUsed();
+      size += RamUsageEstimator.sizeOf(dimensionSums);
+      return size;
+    }
+
     @Override
     public long ramBytesUsed() {
-      long size = SHALLOW_SIZE;
+      long size = quantizationOverheadBytesUsed();
       size += flatFieldVectorsWriter.ramBytesUsed();

Review Comment:
   So the raw delegate above would now be responsible to account for vector 
data for both float and bytes and hence we switched to call the overhead part 
in this? But then will we not double count it for floats her with 
`flatFieldVectorsWriter.ramBytesUsed` and also 
`rawVectorDelegate.ramBytesUsed`(the newly added one)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix undercounting of RAM used by vectors buffered in in-memory segments [lucene]

Reply via email to