GWphua commented on code in PR #18731:
URL: https://github.com/apache/druid/pull/18731#discussion_r2712085454


##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/LimitedBufferHashGrouper.java:
##########
@@ -571,7 +585,19 @@ public void adjustTableWhenFull()
 
       size = numCopied;
       tableBuffer = newTableBuffer;
+      updateMaxTableBufferUsedBytes();
       growthCount++;
     }
+
+    @Override
+    protected void updateMaxTableBufferUsedBytes()
+    {
+      long currentBufferUsedBytes = 0;
+      for (ByteBuffer buffer : subHashTableBuffers) {
+        currentBufferUsedBytes += buffer.capacity();
+      }

Review Comment:
   Hello, I have added the tests for the groupers.
   
   I did not get the same results as you, maybe because I used queries for a 
smaller dataset. 
   
   How I did in my tests is to query with spill to disk enabled:
   1. Set druid.processing.buffer.sizeBytes = 1GB
   2. Query on a dataset. (Let's say the results for this is 100MB)
   3. Set druid.processing.buffer.sizeBytes to a much smaller value ~5MB
   4. Query on the same dataset, and watch the usage metrics cap at 5MB, with 
spillage to disk ~95MB.
   
   Here's an example of how my max metrics look like:
   <img width="3328" height="1308" alt="image" 
src="https://github.com/user-attachments/assets/d6d59e5a-7dc1-4d4a-b067-f25fdbbd8c71";
 />
   
   I do have to admit, some of the values are kinda "blocky", like it will 
report ~28MB repeatedly for, say 3 consecutive metrics, then report some other 
value. Maybe this is because similar queries are being sent during a short 
period of time, and perhaps the allocated space is the same for these similar 
queries. Hopefully, this will be fixed by your catch -- reporting the usage 
instead of the capacity. 😄



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to