[jira] [Comment Edited] (FLINK-21920) Optimize ExecutionGraphToInputsLocationsRetrieverAdapter

Zhilong Hong (Jira) Tue, 03 Aug 2021 00:12:08 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-21920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392042#comment-17392042
 ]


Zhilong Hong edited comment on FLINK-21920 at 8/3/21, 7:11 AM:
---------------------------------------------------------------

The proposed solution will make 
{{ExecutionGraphToInputsLocationsRetrieverAdapter}} stateful, which is hard to 
maintain. However, the improvement of this optimization seems not obvious. For 
now we just close it. If later we got a better idea, we'd like to reopen it.


was (Author: thesharing):
The proposed solution will make 
{{ExecutionGraphToInputsLocationsRetrieverAdapter}} stateful, which is hard to 
maintain. However, the improvement of this optimization seems not obvious.

> Optimize ExecutionGraphToInputsLocationsRetrieverAdapter
> --------------------------------------------------------
>
>                 Key: FLINK-21920
>                 URL: https://issues.apache.org/jira/browse/FLINK-21920
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.0
>            Reporter: Zhilong Hong
>            Priority: Major
>              Labels: auto-unassigned, pull-request-available
>
> Based on the scheduler benchmark introduced in FLINK-21731, we find that 
> there's a procedure related to {{DefaultScheduler#allocateSlots}} that has 
> O(N^2) complexity, which is: 
> {{ExecutionGraphToInputsLocationsRetrieverAdapter#getConsumedResultPartitionsProducers}}.
> The original implementation is:
> {code:java}
> for all SchedulingExecutionVertex in DefaultScheduler:
>   for all ConsumedPartitionGroup of the SchedulingExecutionVertex:
>     for all IntermediateResultPartition in the ConsumedPartitionGroup:
>       get producer of the IntermediateResultPartition {code}
> This procedure has O(N^2) complexity.
> We can see that for each SchedulingExecutionVertex, the producers of its 
> ConsumedPartitionGroup is calculated separately. For the 
> SchedulingExecutionVertices in the same ConsumerVertexGroup, they have the 
> same ConsumedPartitionGroup. Therefore, we don't need to calculate the 
> producers over and over again. We can use a local cache to cache the 
> producers. This will decrease the complexity from O(N^2) to O(N).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-21920) Optimize ExecutionGraphToInputsLocationsRetrieverAdapter

Reply via email to