reswqa commented on code in PR #21199: URL: https://github.com/apache/flink/pull/21199#discussion_r1043164686
########## flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/VertexwiseSchedulingStrategy.java: ########## @@ -125,24 +122,64 @@ private void maybeScheduleVertices(final Set<ExecutionVertexID> vertices) { newVertices.clear(); } - final Set<ExecutionVertexID> verticesToDeploy = - allCandidates.stream() - .filter( - vertexId -> { - SchedulingExecutionVertex vertex = - schedulingTopology.getVertex(vertexId); - checkState(vertex.getState() == ExecutionState.CREATED); - return inputConsumableDecider.isInputConsumable( - vertexId, - Collections.emptySet(), - consumableStatusCache); - }) - .collect(Collectors.toSet()); + final Set<ExecutionVertexID> verticesToDeploy = new HashSet<>(); + + Set<ExecutionVertexID> nextVertices = allCandidates; + while (!nextVertices.isEmpty()) { + nextVertices = addToDeployAndGetVertices(nextVertices, verticesToDeploy); + } scheduleVerticesOneByOne(verticesToDeploy); scheduledVertices.addAll(verticesToDeploy); } + private Set<ExecutionVertexID> addToDeployAndGetVertices( + Set<ExecutionVertexID> currentVertices, Set<ExecutionVertexID> verticesToDeploy) { + Set<ExecutionVertexID> nextVertices = new HashSet<>(); + // cache consumedPartitionGroup's consumable status to avoid compute repeatedly. + final Map<ConsumedPartitionGroup, Boolean> consumableStatusCache = new HashMap<>(); Review Comment: Consider this situation using the topology as you described: Firstly, we call `addToScheduleAndGetVertices` with only one vertex `A`, and it will becomes schedulable. Then `B&C` is added to `nextVertices`, triggering the next round `addToScheduleAndGetVertices`. If `C` goes out of the `currentVertices` set first, because `B` has not yet become schedulable, the `ConsumedPartitionGroup` where B is located will be marked as `false` in the `consumableStatusCache`. Next, `B` goes out of `currentVertices` and judges itself as scheduled. It will add `C` to the `nextVertices` again, and in the next round of `addToScheduleAndGetVertices`, `C` will judge whether it can be scheduled. Since the `consumableStatusCache` has considered `B` as false, even if B is already in the scheduled state, `C` cannot be scheduled forever. The above example only considers the reuse of `consumableStatusCache` in the multiple calls of `addToScheduleAndGetVertices`. If the reuse of `visitedConsumerVertexGroup` is added, `C` will not even be added to `nextVertices` for the second time. ########## flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/VertexwiseSchedulingStrategy.java: ########## @@ -125,24 +122,64 @@ private void maybeScheduleVertices(final Set<ExecutionVertexID> vertices) { newVertices.clear(); } - final Set<ExecutionVertexID> verticesToDeploy = - allCandidates.stream() - .filter( - vertexId -> { - SchedulingExecutionVertex vertex = - schedulingTopology.getVertex(vertexId); - checkState(vertex.getState() == ExecutionState.CREATED); - return inputConsumableDecider.isInputConsumable( - vertexId, - Collections.emptySet(), - consumableStatusCache); - }) - .collect(Collectors.toSet()); + final Set<ExecutionVertexID> verticesToDeploy = new HashSet<>(); + + Set<ExecutionVertexID> nextVertices = allCandidates; + while (!nextVertices.isEmpty()) { + nextVertices = addToDeployAndGetVertices(nextVertices, verticesToDeploy); + } scheduleVerticesOneByOne(verticesToDeploy); scheduledVertices.addAll(verticesToDeploy); } + private Set<ExecutionVertexID> addToDeployAndGetVertices( + Set<ExecutionVertexID> currentVertices, Set<ExecutionVertexID> verticesToDeploy) { + Set<ExecutionVertexID> nextVertices = new HashSet<>(); + // cache consumedPartitionGroup's consumable status to avoid compute repeatedly. + final Map<ConsumedPartitionGroup, Boolean> consumableStatusCache = new HashMap<>(); Review Comment: Consider this situation using the topology as you described: Firstly, we call `addToScheduleAndGetVertices` with only one vertex `A`, and it will becomes schedulable. Then `B&C` is added to `nextVertices`, triggering the next round `addToScheduleAndGetVertices`. If `C` goes out of the `currentVertices` set first, because `B` has not yet become schedulable, the `ConsumedPartitionGroup` where B is located will be marked as `false` in the `consumableStatusCache`. Next, `B` goes out of `currentVertices` and judges itself as scheduled. It will add `C` to the `nextVertices` again, and in the next round of `addToScheduleAndGetVertices`, `C` will judge whether it can be scheduled. Since the `consumableStatusCache` has considered `B` as false, even if B is already in the scheduled state, `C` cannot be scheduled forever. The above example only considers the reuse of `consumableStatusCache` in the multiple calls of `addToScheduleAndGetVertices`. If the reuse of `visitedConsumerVertexGroup` is added, `C` will not even be added to `nextVertices` for the second time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org