Zhanghao Chen created FLINK-31827: ------------------------------------- Summary: Incorrect estimation of the target data rate of a vertex when only a subset of its upstream vertex's output is consumed Key: FLINK-31827 URL: https://issues.apache.org/jira/browse/FLINK-31827 Project: Flink Issue Type: Bug Components: Autoscaler Reporter: Zhanghao Chen Attachments: image-2023-04-17-23-37-35-280.png
Currently, the target data rate of a vertex = SUM(target data rate * input/output ratio) for all of its upstream vertices. This assumes that all output records of an upstream vertex is consumed by the downstream vertex. However, it does not always hold. Consider the following job plan generated by a Flink SQL job. The middle vertex contains multiple chained Calc(select xx) operators, each connecting to a separate downstream sink tasks. As a result, each sink task only consumes a sub-portion of the middle vertex's output. To fix it, we need operator level edge info to infer the upstream-downstream relationship as well as operator level output metrics. The metrics part is easy but AFAIK, there's no way to get the operator level edge info from the Flink REST API yet. !image-2023-04-17-23-37-35-280.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)