smengcl opened a new pull request, #10339: URL: https://github.com/apache/ozone/pull/10339
Generated-by: Claude Code (Opus 4.7) ## What changes were proposed in this pull request? SCMCommonPlacementPolicy.getMaxReplicasPerRack divides by numberOfRacks without a zero check. The caller (validateContainerPlacement) reaches the divide via Math.min(requiredRacks, numRacks); when the network topology transiently reports zero racks (observed during a DN decommission) the existing requiredRacks==1 short-circuit does not catch it. The ReplicationMonitor catches the exception and calls: ``` ExitUtil.terminate(1, t) ``` so the SCM JVM exits. Fix: * Compute numRacks before the early-return guard and short-circuit when numRacks <= 0 or requiredRacks <= 1 (was: == 1). * Add a defensive guard in getMaxReplicasPerRack that returns numReplicas when numberOfRacks <= 0, mirroring HDDS-14371's pattern for an analogous div-by-zero in ContainerManagerImpl. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-15350 ## How was this patch tested? - TestSCMCommonPlacementPolicy.testValidateContainerPlacementWithZeroRackTopology reproduces the empty-topology window with a mocked NetworkTopology returning 0 from getNumOfNodes. Without the fix the test errors out with "Arithmetic / by zero". With the fix it passes and all 20 tests in TestSCMCommonPlacementPolicy remain green. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
