eaglewatcherwb edited a comment on issue #8309: [FLINK-12229] [runtime] Implement LazyFromSourcesScheduling Strategy URL: https://github.com/apache/flink/pull/8309#issuecomment-490734729 Hi, @GJL @tillrohrmann , I have updated the PR based on the latest master. Would you mind to begin another round of code review ? As discussed in the first round of code review, there are still some open questions, I make a summary to be convenient for discussion. Any comment is highly appreciated. 1. [[SchedulePolicy](https://github.com/apache/flink/pull/8309#discussion_r281049742)] Using vertex state transitions to schedule vertices has the benefit of avoiding flood `onPartitionConsumable` notifications, while there may be idle-waiting-result of PIPELINED shuffle mode. So, I think we could keep the benefit by relying on both vertex state transitions and `onPartitionConsumable` notifications. 1) `DeploymentOption#sendScheduleOrUpdateConsumerMessage` set to true if the vertex has PIPELINED produced result partition and set to false if all the produced result partitions are BLOCKING 2) Schedule vertices with BLOCKING input result partition using vertex state transition. 3) Schedule vertices with PIPELINED input result partitions using `onPartitionConsumable` notification. 2. [[JobGraph Usage](https://github.com/apache/flink/pull/8309#discussion_r281037134)] The only usage of `JobGraph` is to provide `InputDependencyConstraint` in `LazyFromSourcesSchedulingStrategy`, while, it is not used in `EagerSchedulingStrategy`. Maybe we could remove `JobGraph` from `SchedulingStrategyFactory#createInstance` and add `InputDependencyConstraint` information into `SchedulingTopology`, which need an new interface in `SchedulingTopology`: `InputDependencyConstraint getInputDependencyConstraint(JobVertexID jobVertexId)`? 3. [[ANY/ALL Schedule Granularity](https://issues.apache.org/jira/browse/FLINK-12229)] In the original scheduler, the schedule granularity is ANY/ALL the IntermediateDataSet finishes, and using granularity of result partition could speedup deployments but may involve flood of partition update network communication and resource deadlock. Thus, in this PR my implementation is consistent with the original logic. However, we are wondering we could use some methods to keep both speedup deployments advantage and avoiding flood partition update and resource deadlock. Based on our production experience, we propose to introduce a new trigger `InputDependencyConstraint#Progress`, which is a float between 0.0~1.0 identifying the percentage of the finish result partitions. 1.0 means ALL the input result partitions finish and we configured it to 0.8 in our production system as default to balance the speedup advantage and flood partition update, possible resource deadlock.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services