[ https://issues.apache.org/jira/browse/FLINK-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Piotr Nowojski closed FLINK-16728. ---------------------------------- Resolution: Not A Bug As [~zhuzh] suggested, this is probably an issue that some operator was blocked and not responding to cancellations/interrupts. If you [~lilyevsky] can pinpoint what was not responding (collect stack trace?), and it leads to a code/bug in Flink (and not in for example a custom user code), please open another ticket so we can fix that issue. Here I think everything is working as expected. > Taskmanager dies after job got stuck and canceling fails > -------------------------------------------------------- > > Key: FLINK-16728 > URL: https://issues.apache.org/jira/browse/FLINK-16728 > Project: Flink > Issue Type: Bug > Components: Runtime / Task > Affects Versions: 1.10.0 > Reporter: Leonid Ilyevsky > Priority: Major > Attachments: taskmanager.log.20200323.gz > > > At some point I noticed that a few jobs got stuck (they basically stopped > processing the messages, I could detect this watching the expected output), > so I tried to cancel them. > The cancel operation failed, complaining that the job got stuck at > StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:86) > and then the whole taskmanager shut down. > See the attached log. > This is actually happening practically every day in our staging environment > where we are testing Flink 1.10.0. -- This message was sent by Atlassian Jira (v8.3.4#803005)