[ 
https://issues.apache.org/jira/browse/SPARK-11801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034914#comment-15034914
 ] 

Mridul Muralidharan commented on SPARK-11801:
---------------------------------------------


There are few aspects here :
a) A race condition between OOM being thrown vs (kill being invoked + SIGINT 
handler starting shutdown).
b) Whether we actually see the OOM (libraries and user code are known to 
swallow them for example).
c) If we do see the OOM whether we have sufficient time to do anything about it.
d) OOM can be thrown in any thread when memory is requested for - this includes 
executor threads, spark daemon threads, hadoop threads (dfs, yarn, etc), 
others. And an OOM being thrown causes kill to be executed.
e) When VM exhausts memory/when kill is executed - we entire VM lifecycle where 
things are slightly unstable and can have unexpected failures.


Given all these, I am very unsure about trying to handle OOM - particularly if 
we have code to handle it and send msg to driver, dev/users start expecting it 
to work making it more confusing all around - but that is a personal opinion 
since I am used to mucking through logs :-)

Sidenote: Threads dont need to be created - in shutdown hooks we register 
threads - they are just not scheduled yet (start is not invoked, but the native 
init, etc expensive bits are done).



> Notify driver when OOM is thrown before executor JVM is killed 
> ---------------------------------------------------------------
>
>                 Key: SPARK-11801
>                 URL: https://issues.apache.org/jira/browse/SPARK-11801
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.5.1
>            Reporter: Srinivasa Reddy Vundela
>            Priority: Minor
>
> Here is some background for the issue.
> Customer got OOM exception in one of the task and executor got killed with 
> kill %p. It is unclear in driver logs/Spark UI why the task is lost or 
> executor is lost. Customer has to look into the executor logs to see OOM is 
> the cause for the task/executor lost. 
> It would be helpful if driver logs/spark UI shows the reason for task 
> failures by making sure that task updates the driver with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to