[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897789#comment-15897789 ]
Shixiong Zhu commented on SPARK-19831: -------------------------------------- Cores running in the receive method should be quick. If that's not true, such codes should be run in a separate thread. Which part of codes in Worker did you find is very slow? > Sending the heartbeat master from worker maybe blocked by other rpc messages > ------------------------------------------------------------------------------ > > Key: SPARK-19831 > URL: https://issues.apache.org/jira/browse/SPARK-19831 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.2.0 > Reporter: hustfxj > Priority: Minor > > Cleaning the application may cost much time at worker, then it will block > that the worker send heartbeats master because the worker is extend > *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the > message *ApplicationFinished*, master will think the worker is dead. If the > worker has a driver, the driver will be scheduled by master again. So I think > it is the bug on spark. It may solve this problem by the followed suggests: > 1. It had better put the cleaning the application in a single asynchronous > thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages > like *SendHeartbeat*; > 2. It had better not send the heartbeat master by Rpc channel. Because any > other rpc message may block the rpc channel. It had better send the heartbeat > master at an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org