[ 
https://issues.apache.org/jira/browse/SPARK-30297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyangyu updated SPARK-30297:
------------------------------
    Description: 
h3. *Backgroud*

The driver can't sense this executor was lost through the network connection 
disconnection If an executor was lost in the network and it have not responsed 
rst and close packet to driver, so driver can only sense this executor dead 
through heartbeat expired.
h3. *Scene*
 # Executor heartbeat expired as above.
 # HeartbeatReceiver will call scheduler executor lost to rescheduler the tasks 
on this executor.
 # HeartbeatReceiver kill the executor.

The task on the dead executor will be rescheduled on this dead executor again 
if the task rescheduler before the executor has't remove from executorBackend, 
it will send launch task to this executor again, the executor will not response 
and the driver can't sense through heartbeat beause the executor has lost in 
network.

  was:
h3. *Backgroud*

The driver can't sense this executor was lost through the network connection 
disconnection If an executor was lost in the network and it have not responsed 
rst and close packet to driver, so driver can only sense this executor dead 
through heartbeat expired.
h3. *Scene*
 # Executor heartbeat expired as above.
 # HeartbeatReceiver will call scheduler executor lost to rescheduler the task 
on this executor.
 # HeartbeatReceiver kill the executor.

The task on the dead executor will be rescheduled on this dead executor again 
if the task rescheduler before the executor has't remove from executorBackend, 
it will send launch task to this executor again, the executor will not response 
and the driver can't sense through heartbeat beause the executor has lost in 
network.


> Executor heartbeat expired cause app hung up forever
> ----------------------------------------------------
>
>                 Key: SPARK-30297
>                 URL: https://issues.apache.org/jira/browse/SPARK-30297
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0, 2.4.4
>            Reporter: haiyangyu
>            Priority: Major
>
> h3. *Backgroud*
> The driver can't sense this executor was lost through the network connection 
> disconnection If an executor was lost in the network and it have not 
> responsed rst and close packet to driver, so driver can only sense this 
> executor dead through heartbeat expired.
> h3. *Scene*
>  # Executor heartbeat expired as above.
>  # HeartbeatReceiver will call scheduler executor lost to rescheduler the 
> tasks on this executor.
>  # HeartbeatReceiver kill the executor.
> The task on the dead executor will be rescheduled on this dead executor again 
> if the task rescheduler before the executor has't remove from 
> executorBackend, it will send launch task to this executor again, the 
> executor will not response and the driver can't sense through heartbeat 
> beause the executor has lost in network.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to