[jira] [Reopened] (FLINK-12219) Yarn application can't stop when flink job failed in per-job yarn cluster mode

lamber-ken (JIRA) Mon, 06 May 2019 05:18:44 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


lamber-ken reopened FLINK-12219:
--------------------------------

> Yarn application can't stop when flink job failed in per-job yarn cluster mode
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-12219
>                 URL: https://issues.apache.org/jira/browse/FLINK-12219
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN, Runtime / REST
>    Affects Versions: 1.6.3, 1.8.0
>            Reporter: lamber-ken
>            Assignee: lamber-ken
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: fix-bug.patch, image-2019-04-17-15-00-40-687.png, 
> image-2019-04-17-15-02-49-513.png, image-2019-04-23-17-37-00-081.png
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> h3. *Issue detail info*
> In our flink(1.6.3) product env, I often encounter a scene that yarn 
> application can't stop when flink job failed in per-job yarn cluste mode, so 
> I deeply analyzed the reason why it happened.
> When a flink job fail, system will write an archive file to a FileSystem 
> through +MiniDispatcher#archiveExecutionGraph+ method, then notify 
> YarnJobClusterEntrypoint to shutDown. But, if 
> +MiniDispatcher#archiveExecutionGraph+ throw exceptions during execution, it 
> affect the following calls.
> So I open 
> [FLINK-12247|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-12247]
>  to solve NEP bug when system write archive to FileSystem. But We still need 
> to consider other exceptions, so we should catch Exception / Throwable not 
> just IOExcetion.
> h3. *Flink yarn job fail flow*
> !image-2019-04-23-17-37-00-081.png!
> h3. *Flink yarn job fail on yarn*
> !image-2019-04-17-15-00-40-687.png!
>  
> h3. *Flink yarn application can't stop*
> !image-2019-04-17-15-02-49-513.png!
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (FLINK-12219) Yarn application can't stop when flink job failed in per-job yarn cluster mode

Reply via email to