Kartikey Pant created FLINK-35319:
-------------------------------------

             Summary: Specific Operator getting stuck in Initialization for a 
long time
                 Key: FLINK-35319
                 URL: https://issues.apache.org/jira/browse/FLINK-35319
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.16.1
            Reporter: Kartikey Pant
         Attachments: Screenshot 2024-05-09 at 2.19.10 PM.png, image.png

Our team has been working on a Flink service. After completing the service 
development, we moved on to the Job Stabilisation exercises at the production 
load.
 
During high load, we see that if the job restarts (mostly due to the 
"org.apache.flink.util.FlinkExpectedException: The TaskExecutor is shutting 
down"), one of the operators gets stuck in the INITIALISATION state.
 
This happens even when all the required capacity is present and all the TMs are 
up and running. Other operators that have even higher parallelism than this 
particular operator initialise fast whilst this particular operator sometimes 
takes more than 30 minutes.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to