[ 
https://issues.apache.org/jira/browse/SPARK-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Angelini updated SPARK-9745:
---------------------------------
    Description: 
When a job has only a single executor remaining and that executor dies (due to 
something like an OOM), the application fails to notice that there are no 
executors left and it hangs indefinitely.

This only happens when dynamic allocation is enabled.

The following images were taken from a hung application with no executors:

!logs_hung_job.png!

^^ *Notice how 1 executor was lost, but the application never requested it to 
be removed*





!am_hung_job.png!

!executors_hung_job.png!

!tasks_hung_job.png!

  was:
When a job has only a single executor remaining and that executor dies (due to 
something like an OOM), the application fails to notice that there are no 
executors left and it hangs indefinitely.

This only happens when dynamic allocation is enabled.

The following images were taken from a hung application with no executors:

!logs_hung_job.png!

*^^ Notice how 1 executor was lost, but the application never requested it to 
be removed*





!am_hung_job.png!

!executors_hung_job.png!

!tasks_hung_job.png!


> Applications hangs when the last executor fails with dynamic allocation
> -----------------------------------------------------------------------
>
>                 Key: SPARK-9745
>                 URL: https://issues.apache.org/jira/browse/SPARK-9745
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Scheduler, YARN
>    Affects Versions: 1.5.0
>         Environment: YARN + Pyspark + Dynamic Allocation
>            Reporter: Alex Angelini
>         Attachments: am_hung_job.png, executors_hung_job.png, 
> logs_hung_job.png, tasks_hung_job.png
>
>
> When a job has only a single executor remaining and that executor dies (due 
> to something like an OOM), the application fails to notice that there are no 
> executors left and it hangs indefinitely.
> This only happens when dynamic allocation is enabled.
> The following images were taken from a hung application with no executors:
> !logs_hung_job.png!
> ^^ *Notice how 1 executor was lost, but the application never requested it to 
> be removed*
> !am_hung_job.png!
> !executors_hung_job.png!
> !tasks_hung_job.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to