[ 
https://issues.apache.org/jira/browse/OOZIE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley reassigned OOZIE-2326:
----------------------------------------------

    Assignee: Satish Subhashrao Saley

> oozie/yarn/spark: active container remains after failed job
> -----------------------------------------------------------
>
>                 Key: OOZIE-2326
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2326
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 4.1.0
>         Environment: pseudo-distributed (single VM), CentOS 6.6, CDH 5.4.3
>            Reporter: Diana Carroll
>            Assignee: Satish Subhashrao Saley
>         Attachments: container-logs.txt, ooziejob-logs.txt, yarnbug1.png, 
> yarnbug2.png
>
>
> Issue occurs when I launch a Spark job (local mode) that fails.  (My example 
> failed because I tried to read a non-existent file).  When this occur, the 
> job fails, and YARN ends up in a weird state: the RM manager shows the launch 
> job has completed...but a container for the job is still live on the slave 
> node.  Because I'm running in pseudo-dist mode, this totally hangs my 
> cluster: no other jobs can run because there are only resources for a single 
> container, and that container is running the dead Oozie launcher.
> If I wait long enough, YARN will eventually time out and release the 
> container and start accepting new jobs.  But until then I'm dead in the water.
> Attaching screen shots that show the state right after running the failed job:
> the RM shows no jobs running
> the node shows one container running
> Also attaching a log file for the oozie job and the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to