reswqa commented on code in PR #21199:
URL: https://github.com/apache/flink/pull/21199#discussion_r1043164686


##########
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/VertexwiseSchedulingStrategy.java:
##########
@@ -125,24 +122,64 @@ private void maybeScheduleVertices(final 
Set<ExecutionVertexID> vertices) {
             newVertices.clear();
         }
 
-        final Set<ExecutionVertexID> verticesToDeploy =
-                allCandidates.stream()
-                        .filter(
-                                vertexId -> {
-                                    SchedulingExecutionVertex vertex =
-                                            
schedulingTopology.getVertex(vertexId);
-                                    checkState(vertex.getState() == 
ExecutionState.CREATED);
-                                    return 
inputConsumableDecider.isInputConsumable(
-                                            vertexId,
-                                            Collections.emptySet(),
-                                            consumableStatusCache);
-                                })
-                        .collect(Collectors.toSet());
+        final Set<ExecutionVertexID> verticesToDeploy = new HashSet<>();
+
+        Set<ExecutionVertexID> nextVertices = allCandidates;
+        while (!nextVertices.isEmpty()) {
+            nextVertices = addToDeployAndGetVertices(nextVertices, 
verticesToDeploy);
+        }
 
         scheduleVerticesOneByOne(verticesToDeploy);
         scheduledVertices.addAll(verticesToDeploy);
     }
 
+    private Set<ExecutionVertexID> addToDeployAndGetVertices(
+            Set<ExecutionVertexID> currentVertices, Set<ExecutionVertexID> 
verticesToDeploy) {
+        Set<ExecutionVertexID> nextVertices = new HashSet<>();
+        // cache consumedPartitionGroup's consumable status to avoid compute 
repeatedly.
+        final Map<ConsumedPartitionGroup, Boolean> consumableStatusCache = new 
HashMap<>();

Review Comment:
   Consider this situation using the topology as you described: 
   Firstly, we call `addToScheduleAndGetVertices` with only one vertex `A`, and 
it will becomes schedulable. Then `B&C` is added to `nextVertices`, triggering 
the next round `addToScheduleAndGetVertices`. If `C` goes out of the 
`currentVertices` set first, because `B` has not yet become schedulable, the 
`ConsumedPartitionGroup` where B is located will be marked as `false` in the 
`consumableStatusCache`. Next, `B` goes out of `currentVertices` and judges 
itself as scheduled. It will add `C` to the `nextVertices` again, and in the 
next round of `addToScheduleAndGetVertices`, `C` will judge whether it can be 
scheduled. Since the `consumableStatusCache` has considered `B` as false, even 
if B is already in the scheduled state, `C` cannot be scheduled forever.
   The above example only considers the reuse of `consumableStatusCache` in the 
multiple calls of `addToScheduleAndGetVertices`. If the reuse of 
`visitedConsumerVertexGroup` is added, `C` will not even be added to 
`nextVertices` for the second time.



##########
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/VertexwiseSchedulingStrategy.java:
##########
@@ -125,24 +122,64 @@ private void maybeScheduleVertices(final 
Set<ExecutionVertexID> vertices) {
             newVertices.clear();
         }
 
-        final Set<ExecutionVertexID> verticesToDeploy =
-                allCandidates.stream()
-                        .filter(
-                                vertexId -> {
-                                    SchedulingExecutionVertex vertex =
-                                            
schedulingTopology.getVertex(vertexId);
-                                    checkState(vertex.getState() == 
ExecutionState.CREATED);
-                                    return 
inputConsumableDecider.isInputConsumable(
-                                            vertexId,
-                                            Collections.emptySet(),
-                                            consumableStatusCache);
-                                })
-                        .collect(Collectors.toSet());
+        final Set<ExecutionVertexID> verticesToDeploy = new HashSet<>();
+
+        Set<ExecutionVertexID> nextVertices = allCandidates;
+        while (!nextVertices.isEmpty()) {
+            nextVertices = addToDeployAndGetVertices(nextVertices, 
verticesToDeploy);
+        }
 
         scheduleVerticesOneByOne(verticesToDeploy);
         scheduledVertices.addAll(verticesToDeploy);
     }
 
+    private Set<ExecutionVertexID> addToDeployAndGetVertices(
+            Set<ExecutionVertexID> currentVertices, Set<ExecutionVertexID> 
verticesToDeploy) {
+        Set<ExecutionVertexID> nextVertices = new HashSet<>();
+        // cache consumedPartitionGroup's consumable status to avoid compute 
repeatedly.
+        final Map<ConsumedPartitionGroup, Boolean> consumableStatusCache = new 
HashMap<>();

Review Comment:
   Consider this situation using the topology as you described: 
   Firstly, we call `addToScheduleAndGetVertices` with only one vertex `A`, and 
it will becomes schedulable. Then `B&C` is added to `nextVertices`, triggering 
the next round `addToScheduleAndGetVertices`. If `C` goes out of the 
`currentVertices` set first, because `B` has not yet become schedulable, the 
`ConsumedPartitionGroup` where B is located will be marked as `false` in the 
`consumableStatusCache`. Next, `B` goes out of `currentVertices` and judges 
itself as scheduled. It will add `C` to the `nextVertices` again, and in the 
next round of `addToScheduleAndGetVertices`, `C` will judge whether it can be 
scheduled. Since the `consumableStatusCache` has considered `B` as false, even 
if B is already in the scheduled state, `C` cannot be scheduled forever.
   
   The above example only considers the reuse of `consumableStatusCache` in the 
multiple calls of `addToScheduleAndGetVertices`. If the reuse of 
`visitedConsumerVertexGroup` is added, `C` will not even be added to 
`nextVertices` for the second time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to