[jira] [Created] (FLINK-31245) Adaptive scheduler does not reset the state of GlobalAggregateManager when rescaling

Zhanghao Chen (Jira) Mon, 27 Feb 2023 05:05:08 -0800

Zhanghao Chen created FLINK-31245:
-------------------------------------

             Summary: Adaptive scheduler does not reset the state of 
GlobalAggregateManager when rescaling
                 Key: FLINK-31245
                 URL: https://issues.apache.org/jira/browse/FLINK-31245
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.16.1
            Reporter: Zhanghao Chen
             Fix For: 1.18.0



*Problem*

GlobalAggregateManager is used to share state amongst parallel tasks in a job 
and thus coordinate their execution. It maintains a state (the _accumulators_ 
field in JobMaster) in JM memory. The accumulator state content is defined in 
user code, in my company, a user stores task parallelism in the accumulator, 
assuming task parallelism never changes. However, this assumption is broken 
when using adaptive scheduler.

*Possible Solutions*
 # Mark GlobalAggregateManager as deprecated. It seems that operator 
coordinator can completely replace GlobalAggregateManager and is a more elegent 
solution. Therefore, it is fine to deprecate GlobalAggregateManager and leave 
this issue there. It that's the case, we can open another ticket for doing that.
 # If we decide to continue supporting GlobalAggregateManager, then we need to 
reset the state when rescaling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31245) Adaptive scheduler does not reset the state of GlobalAggregateManager when rescaling

Reply via email to