[ 
https://issues.apache.org/jira/browse/FLINK-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830129#comment-16830129
 ] 

BoWang edited comment on FLINK-12229 at 4/30/19 9:48 AM:
---------------------------------------------------------

Hi, [~till.rohrmann] [~gjy] [~tiemsn]
 In the origin scheduler, the consumer vertex is scheduled when ANY/ALL the 
IntermediateDataSet is consumable, and IntermediateDataSet is consumable when 
all the result partitions are finished for BLOCKING ResultType. Shall we be 
consistent with this logic in the new scheduler?

Another question is that when I implemented Lazy strategy, I found that each 
time the producer vertex state change or partition consumable notification, all 
the input partitions of the vertex will be checked to decide whether it should 
be scheduled. With n producer vertices and n consumer vertices, the partitions 
would be checked O(n^2) times. I think it is very inefficient. If we add 
SchedulingIntermediateDataSet and react to vertex state change notification, 
relying on the counter of the SchedulingIntermediateDataSet, it needs only O(n 
) partition check times (This is what I did in [GitHub Pull Request 
#8309|https://github.com/apache/flink/pull/8309]). Another option is to 
maintain some member variables in LazyFromSourcesSchedulingStrategy to do the 
same thing as SchedulingIntermediateDataSet.

What do you think?


was (Author: eaglewatcher):
Hi, [~till.rohrmann] [~gjy] [~tiemsn]
In the origin scheduler, the consumer vertex is scheduled when ANY/ALL the 
IntermediateDataSet is consumable, and IntermediateDataSet is consumable when 
all the result partitions are finished for BLOCKING ResultType. Shall we be 
consistent with this logic in the new scheduler?

Another question is that when I implemented Lazy strategy, I found that each 
time the producer vertex state change or partition consumable notification, all 
the input partitions of the vertex will be checked to decide whether it should 
be scheduled. With n producer vertices and n consumer vertices, the partitions 
would be checked O(n^2) times. I think it is very inefficient. If we add 
SchedulingIntermediateDataSet and react to vertex state change notification, 
relying on the counter of the SchedulingIntermediateDataSet, it needs only O(n) 
partition check times (This is what I did in [GitHub Pull Request 
#8309|https://github.com/apache/flink/pull/8309]). Another option is to 
maintain some member variables in LazyFromSourcesSchedulingStrategy to do the 
same thing as SchedulingIntermediateDataSet.

What do you think?

> Implement Lazy Scheduling Strategy
> ----------------------------------
>
>                 Key: FLINK-12229
>                 URL: https://issues.apache.org/jira/browse/FLINK-12229
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Gary Yao
>            Assignee: BoWang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement a {{SchedulingStrategy}} that covers the functionality of 
> {{ScheduleMode.LAZY_FROM_SOURCES}}, i.e., vertices are scheduled when all the 
> input data are available.
> Acceptance Criteria:
>  * New strategy is tested in isolation using test implementations (i.e., 
> without having to submit a job)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to