Copilot commented on code in PR #17489:
URL: https://github.com/apache/pinot/pull/17489#discussion_r2681519392


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##########
@@ -1009,7 +1009,7 @@ private void handleSegmentBuildFailure() {
   private boolean startSegmentCommit() {
     SegmentCompletionProtocol.Request.Params params = new 
SegmentCompletionProtocol.Request.Params();
     
params.withSegmentName(_segmentNameStr).withStreamPartitionMsgOffset(_currentOffset.toString())
-        
.withNumRows(_numRowsConsumed).withInstanceId(_instanceId).withReason(_stopReason);
+        
.withNumRows(_numRowsIndexed).withInstanceId(_instanceId).withReason(_stopReason);

Review Comment:
   The change from `_numRowsConsumed` to `_numRowsIndexed` in the segment 
commit flow lacks test coverage. This is a critical bug fix affecting memory 
allocation and segment sizing. Add integration tests that verify correct 
behavior when filtering results in a significant difference between consumed 
and indexed rows (e.g., 60x ratio).



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/segment/SizeBasedSegmentFlushThresholdComputer.java:
##########
@@ -171,9 +171,9 @@ synchronized int computeThreshold(StreamConfig 
streamConfig, String segmentName)
     // .getSizeThresholdToFlushSegment(),
     // we might end up using a lot more memory than required for the segment 
Using a minor bump strategy, until
     // we add feature to adjust time We will only slightly bump the threshold 
based on numRowsConsumed
-    if (_rowsConsumedForLastSegment < _rowsThresholdForLastSegment && 
_sizeForLastSegment < desiredSegmentSizeBytes) {
+    if (_rowsIndexedForLastSegment < _rowsThresholdForLastSegment && 
_sizeForLastSegment < desiredSegmentSizeBytes) {

Review Comment:
   The threshold computation logic using `_rowsIndexedForLastSegment` needs 
test coverage demonstrating the fix prevents memory spikes. Add tests comparing 
memory allocation with high filter ratios (e.g., 200K indexed from 13M 
consumed) to verify the corrected calculation prevents over-allocation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to