iprithv commented on code in PR #15982:
URL: https://github.com/apache/lucene/pull/15982#discussion_r3197561777


##########
lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene102/Lucene102BinaryQuantizedVectorsWriter.java:
##########
@@ -714,11 +718,22 @@ public float[] copyValue(float[] vectorValue) {
       throw new UnsupportedOperationException();
     }
 
+    /**
+     * Returns the RAM usage of quantization-specific state only (magnitudes, 
dimensionSums, shallow
+     * object overhead). The underlying flat vector data is tracked separately 
by the
+     * rawVectorDelegate at the writer level to avoid double-counting.
+     */
+    long quantizationOverheadBytesUsed() {
+      long size = SHALLOW_SIZE;
+      size += magnitudes.ramBytesUsed();
+      size += RamUsageEstimator.sizeOf(dimensionSums);
+      return size;
+    }
+
     @Override
     public long ramBytesUsed() {
-      long size = SHALLOW_SIZE;
+      long size = quantizationOverheadBytesUsed();
       size += flatFieldVectorsWriter.ramBytesUsed();

Review Comment:
   Yes, rawVectorDelegate is now the single source of truth for all flat vector 
data (both byte and float32).
   
   No double counting happens, FieldWriter.flatFieldVectorsWriter is the same 
Java object that rawVectorDelegate holds internally as the per-field writer, 
it's what this.rawVectorDelegate.addField(fieldInfo) returns and then passes 
into new FieldWriter(fieldInfo, rawVectorDelegate). So 
rawVectorDelegate.ramBytesUsed() already accounts for those float vectors.
   
   The writer level loop then calls field.quantizationOverheadBytesUsed(), 
which only counts the FieldWriter shell + magnitudes + dimensionSums, NOT 
flatFieldVectorsWriter. FieldWriter.ramBytesUsed() (which does include 
flatFieldVectorsWriter.ramBytesUsed()) is never called from the writer level 
accounting. It's there solely for the Accountable interface. So each byte of 
flat float data is counted exactly once through rawVectorDelegate.
   
   Thanks @shubhamvishu!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to