[jira] [Updated] (FLINK-28663) Allow multiple downstream consumer job vertices sharing the same intermediate dataset at scheduler side
[ https://issues.apache.org/jira/browse/FLINK-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-28663: --- Labels: pull-request-available (was: ) > Allow multiple downstream consumer job vertices sharing the same intermediate > dataset at scheduler side > --- > > Key: FLINK-28663 > URL: https://issues.apache.org/jira/browse/FLINK-28663 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yingjie Cao >Assignee: Yingjie Cao >Priority: Major > Labels: pull-request-available > > Currently, one intermediate dataset can only be consumed by one downstream > consumer vertex. If there are multiple consumer vertices consuming the same > output of the same upstream vertex, multiple intermediate datasets will be > produced. We can optimize this behavior to produce only one intermediate > dataset which can be shared by multiple consumer vertices. As the first step, > we should allow multiple downstream consumer job vertices sharing the same > intermediate dataset at scheduler side. (Note that this optimization only > works for blocking shuffle because pipelined shuffle result partition can not > be consumed multiple times) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-28663) Allow multiple downstream consumer job vertices sharing the same intermediate dataset at scheduler side
[ https://issues.apache.org/jira/browse/FLINK-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-28663: Parent: FLINK-28374 Issue Type: Sub-task (was: Improvement) > Allow multiple downstream consumer job vertices sharing the same intermediate > dataset at scheduler side > --- > > Key: FLINK-28663 > URL: https://issues.apache.org/jira/browse/FLINK-28663 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yingjie Cao >Priority: Major > > Currently, one intermediate dataset can only be consumed by one downstream > consumer vertex. If there are multiple consumer vertices consuming the same > output of the same upstream vertex, multiple intermediate datasets will be > produced. We can optimize this behavior to produce only one intermediate > dataset which can be shared by multiple consumer vertices. As the first step, > we should allow multiple downstream consumer job vertices sharing the same > intermediate dataset at scheduler side. (Note that this optimization only > works for blocking shuffle because pipelined shuffle result partition can not > be consumed multiple times) -- This message was sent by Atlassian Jira (v8.20.10#820010)