smengcl opened a new pull request, #10339:
URL: https://github.com/apache/ozone/pull/10339

   Generated-by: Claude Code (Opus 4.7)
   
   ## What changes were proposed in this pull request?
   
   SCMCommonPlacementPolicy.getMaxReplicasPerRack divides by numberOfRacks 
without a zero check. The caller (validateContainerPlacement) reaches the 
divide via Math.min(requiredRacks, numRacks); when the network topology 
transiently reports zero racks (observed during a DN decommission) the existing 
requiredRacks==1 short-circuit does not catch it. The ReplicationMonitor 
catches the exception and calls:
   ```
   ExitUtil.terminate(1, t)
   ```
   so the SCM JVM exits.
   
   Fix:
   * Compute numRacks before the early-return guard and short-circuit when 
numRacks <= 0 or requiredRacks <= 1 (was: == 1).
   * Add a defensive guard in getMaxReplicasPerRack that returns numReplicas 
when numberOfRacks <= 0, mirroring HDDS-14371's pattern for an analogous 
div-by-zero in ContainerManagerImpl.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-15350
   
   ## How was this patch tested?
   
   - 
TestSCMCommonPlacementPolicy.testValidateContainerPlacementWithZeroRackTopology 
reproduces the empty-topology window with a mocked NetworkTopology returning 0 
from getNumOfNodes. Without the fix the test errors out with "Arithmetic / by 
zero". With the fix it passes and all 20 tests in TestSCMCommonPlacementPolicy 
remain green.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to