somandal commented on code in PR #15175:
URL: https://github.com/apache/pinot/pull/15175#discussion_r1978375341


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java:
##########
@@ -963,6 +974,15 @@ private Pair<InstancePartitions, Boolean> 
getInstancePartitionsForTier(TableConf
     }
   }
 
+  private DiskUsageInfo getDiskUsageInfoOfInstance(String instanceId) {
+    return ResourceUtilizationInfo.getDiskUsageInfo(instanceId);
+  }
+
+  private long getSegmentsSizeInBytes(Set<String> segments) {
+    // TODO: replace with table API to get total segment size
+    return 1000L * segments.size();

Review Comment:
   I calculate the average segment size based on TableSize API. Can you use 
that to calculate this?
   It is okay for this to be an estimate for now (we can enhance this later to 
use exact segment sizes if we need the accuracy - TableSize API does provide 
per segment information)
   
   ```
    long tableSizePerReplicaInBytes = 
calculateTableSizePerReplicaInBytes(tableNameWithType);
       long averageSegmentSizeInBytes = tableSizePerReplicaInBytes <= 0 ? 
tableSizePerReplicaInBytes
           : tableSizePerReplicaInBytes / ((long) currentAssignment.size());
   ```



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java:
##########
@@ -677,25 +680,33 @@ private RebalanceSummaryResult 
calculateDryRunSummary(Map<String, Map<String, St
       } else {
         serversAdded.add(server);
       }
-      newSegmentSet.removeAll(existingSegmentSet);
+      newSegmentSet.removeAll(intersection);
+      Set<String> removedSegmentSet = new HashSet<>(existingSegmentSet);
+      removedSegmentSet.removeAll(intersection);
       int segmentsAdded = newSegmentSet.size();
       if (segmentsAdded > 0) {
         serversGettingNewSegments.add(server);
       }
       maxSegmentsAddedToServer = Math.max(maxSegmentsAddedToServer, 
segmentsAdded);
-      int segmentsDeleted = existingSegmentSet.size() - segmentsUnchanged;
+      int segmentsDeleted = removedSegmentSet.size();
+
+      long diskSizeChange = getSegmentsSizeInBytes(newSegmentSet) - 
getSegmentsSizeInBytes(removedSegmentSet);

Review Comment:
   Please see the discussion with @Jackie-Jiang on how this should be 
calculated. We want a worst case estimate as well (maybe you can add an 
additional parameter for what worst case will be)
   
   In the worst case we will have `currentDiskUsageInBytes + 
segmentSizeInBytesToBeAdded`, if the segments get added before the segments to 
be removed are deleted from the server.
   
   If `lowDiskMode` is enabled, we should be able to use a more exact size 
since the segments to be removed will be removed first. In that case we can 
assume worst case to be: `currentDiskUsageInBytes + segmentSizeInBytesToBeAdded 
- segmentSizeInBytesToBeRemoved`.



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/RebalanceSummaryResult.java:
##########
@@ -86,12 +89,18 @@ public static class ServerSegmentChangeInfo {
     public ServerSegmentChangeInfo(@JsonProperty("serverStatus") ServerStatus 
serverStatus,
         @JsonProperty("totalSegmentsAfterRebalance") int 
totalSegmentsAfterRebalance,
         @JsonProperty("totalSegmentsBeforeRebalance") int 
totalSegmentsBeforeRebalance,
+        @JsonProperty("diskUsedAfterRebalance") long diskUsedAfterRebalance,

Review Comment:
   - We should also have a field to track the worst case in size increase that 
we can calculate. Have left a comment about how to do this where you do the 
calculations
   - It may also be a good idea to have a % utilization after rebalance for 
both after final rebalance and worst case scenario
   
   One more option may be to add a flag the `ServerInfo` for a quick check or 
something:
   - boolean isAnyServerWorstCaseUtilizationAbove80pctDuringRebalance



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java:
##########
@@ -963,6 +974,15 @@ private Pair<InstancePartitions, Boolean> 
getInstancePartitionsForTier(TableConf
     }
   }
 
+  private DiskUsageInfo getDiskUsageInfoOfInstance(String instanceId) {
+    return ResourceUtilizationInfo.getDiskUsageInfo(instanceId);

Review Comment:
   how often is this updated? It may be a good idea to compute this on the spot 
to give a more accurate value if that is possible
   
   Over time as rebalance runs, we can still run into running out of disk if a 
large number of segments get added during the rebalance, so this is indeed just 
an estimate. If it is too hard to have a more updated value, let's add a 
comment indicating that limitation and some details on how old this value can 
be. we can discuss this some more



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to