[ https://issues.apache.org/jira/browse/FLINK-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061696#comment-17061696 ]
Gary Yao commented on FLINK-16001: ---------------------------------- First of all, thanks for the benchmark. I am able to reproduce the results. Next, I have to correct my own statement that the complexity is linear to the number of distinct pipelined regions. Since we have to touch every vertex, the complexity should be linear to the number of vertices. However the time difference between the streams the non-streams version in your benchmark is less than 1ms for 5000 regions. By increasing the number of vertices per regions to 21, I can measure a difference of 8ms. This is a drop in the bucket especially considering that building the regions can take several seconds. Therefore, rewriting the code to non-streams should be motivated by reasons of legibility and not performance. If you still insist on this performance improvement, I can assign you to this ticket but I would recommend to optimize code paths that are actually slow. > Avoid using Java Streams in construction of ExecutionGraph > ---------------------------------------------------------- > > Key: FLINK-16001 > URL: https://issues.apache.org/jira/browse/FLINK-16001 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.10.0 > Reporter: Jiayi Liao > Priority: Major > Attachments: benchmark.csv > > > I think we should avoid {{Java Streams}} in construction of > {{ExecutionGraph}} like function {{toPipelinedRegionsSet}} in > {{PipelinedRegionComputeUtil}} because the job submission is definitely > performance sensitive, especially when {{distinctRegions}} has a large > cardinality. > Also includes some other places in package > {{org.apache.flink.runtime.executiongraph}} > cc [~trohrmann] [~gjy] [~zhuzh] -- This message was sent by Atlassian Jira (v8.3.4#803005)