[ 
https://issues.apache.org/jira/browse/FLINK-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534797#comment-15534797
 ] 

Zhijiang Wang edited comment on FLINK-4715 at 9/30/16 2:31 AM:
---------------------------------------------------------------

Yes, we already experienced this problem in real production many times,  
because the user code can not be controlled. If the thread is waiting for 
synchronized lock or other cases, it can not be cancelled. We take the way that 
if the job master cancel the task failed many times, the job master will let 
the task manager exit itself.


was (Author: zjwang):
Yes, we already experienced this problem in real production many times,  
because the user code can not be controlled. If the thread is waiting for 
synchronized lock or other cases, it can not be cancelled, and the job master 
cancel the task failed many times, the job master will let the task manager 
exit itself.

> TaskManager should commit suicide after cancellation failure
> ------------------------------------------------------------
>
>                 Key: FLINK-4715
>                 URL: https://issues.apache.org/jira/browse/FLINK-4715
>             Project: Flink
>          Issue Type: Improvement
>          Components: TaskManager
>    Affects Versions: 1.2.0
>            Reporter: Till Rohrmann
>             Fix For: 1.2.0
>
>
> In case of a failed cancellation, e.g. the task cannot be cancelled after a 
> given time, the {{TaskManager}} should kill itself. That way we guarantee 
> that there is no resource leak. 
> This behaviour acts as a safety-net against faulty user code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to