Copilot commented on code in PR #10497:
URL: https://github.com/apache/ozone/pull/10497#discussion_r3428675496
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java:
##########
@@ -1478,17 +1482,32 @@ static class NonWritableNodeFilter implements
Predicate<DatanodeInfo> {
ScmConfigKeys.OZONE_DATANODE_RATIS_VOLUME_FREE_SPACE_MIN,
ScmConfigKeys.OZONE_DATANODE_RATIS_VOLUME_FREE_SPACE_MIN_DEFAULT,
StorageUnit.BYTES);
- containerSize = (long) conf.getStorageSize(
- ScmConfigKeys.OZONE_SCM_CONTAINER_SIZE,
- ScmConfigKeys.OZONE_SCM_CONTAINER_SIZE_DEFAULT,
- StorageUnit.BYTES);
+ this.tracker = tracker;
}
@Override
public boolean test(DatanodeInfo dn) {
return !dn.getNodeStatus().isNodeWritable()
- || (!hasEnoughSpace(dn, minRatisVolumeSizeBytes, containerSize)
- && !hasEnoughCommittedVolumeSpace(dn));
+ || (!hasEnoughSpaceForNode(dn) &&
!hasEnoughCommittedVolumeSpace(dn));
+ }
+
+ /**
+ * Returns true if the datanode has both an available data slot (via
+ * {@link PendingContainerTracker}) and sufficient metadata volume space.
+ */
+ private boolean hasEnoughSpaceForNode(DatanodeInfo dn) {
+ if (!tracker.hasAvailableSpace(dn)) {
+ return false;
+ }
+ if (minRatisVolumeSizeBytes <= 0) {
+ return true;
+ }
+ for (StorageReportProto report : dn.getStorageReports()) {
+ if (report.getRemaining() > minRatisVolumeSizeBytes) {
+ return true;
+ }
+ }
+ return false;
}
Review Comment:
`hasEnoughSpaceForNode` switched the per-volume free space check from
`VolumeUsage.getUsableSpace(report)` (which accounts for committed bytes /
free-space-to-spare semantics) to `report.getRemaining()`. This can
overestimate writable capacity and allow nodes that should be filtered out. Use
the same usable-space calculation here as elsewhere (eg
`VolumeUsage.getUsableSpace(report)`), so writability filtering matches
placement/slot logic.
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java:
##########
@@ -273,7 +271,7 @@ public List<DatanodeDetails>
filterNodesWithSpace(List<DatanodeDetails> nodes,
int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
throws SCMException {
List<DatanodeDetails> nodesWithSpace = nodes.stream().filter(d ->
- hasEnoughSpace(d, metadataSizeRequired, dataSizeRequired))
+ hasEnoughSpace(d, metadataSizeRequired, nodeManager))
.collect(Collectors.toList());
Review Comment:
`filterNodesWithSpace` still accepts `dataSizeRequired` but no longer uses
it after switching to `NodeManager#hasAvailableSpace` (slot-based). This is now
a misleading API (callers can pass different data sizes with no effect). Either
remove `dataSizeRequired` from this method signature (and corresponding call
sites), or incorporate it into the space decision (eg by translating it into
required slots / validating it against the configured slot size).
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/PendingContainerTracker.java:
##########
@@ -208,11 +213,45 @@ public boolean checkSpaceAndRecordAllocation(DatanodeInfo
datanodeInfo, Containe
}
/**
- * Remove a pending container allocation from a specific DataNode.
- * Removes from both current and previous windows.
- * Called when container is confirmed.
+ * Returns true if the given datanode has at least one allocatable container
slot
+ * available, accounting for pending in-flight allocations.
+ *
+ * <p>Slot availability is based on {@code maxContainerSize}: a slot exists
for each
+ * {@code maxContainerSize}-worth of usable space on any volume. This
read-only check
+ * is intended for the placement policy and does not consume a slot.
+ *
+ * @param datanodeInfo the datanode to check
+ * @return true if at least one container slot is available
+ */
+ public boolean hasAvailableSpace(DatanodeInfo datanodeInfo) {
+ Objects.requireNonNull(datanodeInfo, "datanodeInfo == null");
+ List<StorageReportProto> storageReports = datanodeInfo.getStorageReports();
+ if (storageReports.isEmpty()) {
+ return false;
+ }
+ TwoWindowBucket bucket = datanodeInfo.getPendingContainerAllocations();
+ bucket.rollIfNeeded();
+ final int pendingCount = bucket.getCount();
+ long allocatableCount = 0;
+ for (StorageReportProto report : storageReports) {
+ allocatableCount += Math.max(0L, VolumeUsage.getUsableSpace(report)) /
maxContainerSize;
+ if (allocatableCount > pendingCount) {
+ return true;
+ }
+ }
+ LOG.debug("Datanode {} has no available container slots. Pending: {},
Allocatable: {}",
+ datanodeInfo.getID(), pendingCount, allocatableCount);
+ if (metrics != null) {
+ metrics.incNumSkippedFullNodeContainerAllocation();
+ }
+ return false;
Review Comment:
`hasAvailableSpace` is a read-only predicate intended for placement
filtering, but it increments `incNumSkippedFullNodeContainerAllocation()`.
Placement code can call this frequently, which can inflate/reshape this metric
(it may no longer represent actual allocation attempts that were skipped).
Consider either (a) not mutating metrics in `hasAvailableSpace`, or (b)
introducing a separate metric specifically for placement-time 'no-slot' checks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]