[ 
https://issues.apache.org/jira/browse/SPARK-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986175#comment-13986175
 ] 

Mark Hamstra commented on SPARK-1620:
-------------------------------------

Another two instances of the problem that actually aren't a problem at the 
moment: In deploy.worker.Worker and deploy.client.AppClient, 
tryRegisterAllMasters() can throw exceptions (e.g., from 
Master.toAkkaUrl(masterUrl)), and those exception would go unhandled in the 
calls from within the Akka scheduler -- i.e. within an invocation of 
registerWithMaster, all but the first call to tryRegisterAllMasters.  Right 
now, any later call to tryRegisterAllMasters() that would throw an exception 
should already have thrown in the first call that occurs outside the scheduled 
thread, so we should never get to the problem case.  If in the future, however, 
that behavior would change so that tryRegisterAllMasters() could succeed on the 
first call but throw within the later, scheduled calls (or if code added within 
the scheduled retryTimer could throw an exception) then the exception thrown 
from the scheduler thread will not be caught. 

> Uncaught exception from Akka scheduler
> --------------------------------------
>
>                 Key: SPARK-1620
>                 URL: https://issues.apache.org/jira/browse/SPARK-1620
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.9.0, 1.0.0
>            Reporter: Mark Hamstra
>            Priority: Blocker
>
> I've been looking at this one in the context of a BlockManagerMaster that 
> OOMs and doesn't respond to heartBeat(), but I suspect that there may be 
> problems elsewhere where we use Akka's scheduler.
> The basic nature of the problem is that we are expecting exceptions thrown 
> from a scheduled function to be caught in the thread where 
> _ActorSystem_.scheduler.schedule() or scheduleOnce() has been called.  In 
> fact, the scheduled function runs on its own thread, so any exceptions that 
> it throws are not caught in the thread that called schedule() -- e.g., 
> unanswered BlockManager heartBeats (scheduled in BlockManager#initialize) 
> that end up throwing exceptions in BlockManagerMaster#askDriverWithReply do 
> not cause those exceptions to be handled by the Executor thread's 
> UncaughtExceptionHandler. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to