github-actions[bot] commented on code in PR #65049:
URL: https://github.com/apache/doris/pull/65049#discussion_r3503208692
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/kafka/KafkaUtil.java:
##########
@@ -240,6 +240,7 @@ private static InternalService.PProxyResult
getInfoRequest(InternalService.PProx
Backend backend =
Env.getCurrentSystemInfo().getBackend(beId);
if (backend != null && backend.isLoadAvailable()
&& !backend.isDecommissioned()
+ && !backend.isDecommissioning()
&& !failedBeIds.contains(beId)
Review Comment:
The new primary predicate skips decommissioning BEs, but the fallback below
still bypasses it when `backendIds` is empty. Lines 255-260 repopulate
candidates from `routineLoadManager.getBlacklist()` with only `backend !=
null`, so a BE that was blacklisted by an earlier metadata retry and later
enters decommissioning can still be selected here and receive
`BackendServiceProxy.getInfo` once all primary candidates are filtered out.
`KinesisUtil` has the same fallback at lines 112-115. Please apply the same
load-available/non-decommissioned predicate to the blacklist fallback, or share
candidate construction, before sending metadata RPCs.
##########
fe/fe-core/src/main/java/org/apache/doris/load/routineload/RoutineLoadManager.java:
##########
@@ -563,7 +565,7 @@ public long getAvailableBeForTask(long jobId, long
previousBeId) throws UserExce
}
// 4. on the basis of selecting the maximum idle slot be,
// try to reuse the object cache as much as possible
- if (previousBeIdleTaskNum == maxIdleSlotNum) {
+ if (previousBeAvailable && previousBeIdleTaskNum ==
maxIdleSlotNum) {
return previousBeId;
Review Comment:
This still allows a saturated previous BE to be reused when the only idle
capacity comes from a BE that the new availability filter excludes. The
scheduler gate uses `getClusterIdleSlotNum()`, whose slot map is still built
from `getAllBackendIds(true)` and therefore counts alive decommissioning BEs.
After this PR, `getAvailableBackendIds()` excludes that draining BE, so with
one saturated eligible previous BE and one idle decommissioning BE, the loop
below leaves `maxIdleSlotNum == 0` and `resultBeId == -1`, but this tie check
returns the previous BE anyway and `allocateTaskToBe()` submits another task to
that saturated backend. Please keep routine-load slot accounting aligned with
the new eligible-backend predicate, and only reuse the previous BE when it has
a positive idle slot.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]