[ 
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19831:
----------------------------
    Description: 
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like *SendHeartbeat*;

2. It had better not send the heartbeat master by Rpc channel. Because any 
other rpc message may block the rpc channel. It had better send the heartbeat 
master at an asynchronous timing thread .

  was:
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like SendHeartbeat;

2. It had better not send the heartbeat master by rpc channel. Because any 
other rpc message may block the rpc channel. It had better send the heartbeat 
master at an asynchronous timing thread .


> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-19831
>                 URL: https://issues.apache.org/jira/browse/SPARK-19831
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: hustfxj
>
> Cleaning the application may cost much time at worker, then it will block 
> that  the worker send heartbeats master because the worker is extend 
> *ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
> message *ApplicationFinished*,  master will think the worker is dead. If the 
> worker has a driver, the driver will be scheduled by master again. So I think 
> it is the bug on spark. It may solve this problem by the followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous 
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
> like *SendHeartbeat*;
> 2. It had better not send the heartbeat master by Rpc channel. Because any 
> other rpc message may block the rpc channel. It had better send the heartbeat 
> master at an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to