[ 
https://issues.apache.org/jira/browse/SPARK-43978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Konev updated SPARK-43978:
-------------------------------
    Description: Dropping blocks from memory to disk is executed inside a 
synchronized block on MemoryManager (see 
org.apache.spark.storage.memory.MemoryStore#evictBlocksToFreeSpace). Heartbeats 
include memory metrics that are retrieved from MemoryManager using synchronized 
methods. When blocks are being dropped, heartbeats cannot be sent as 
Heartbeater is blocked. If dropping blocks takes longer than the network 
timeout, heartbeats are considered lost and the executor gets killed by the 
driver.  (was: Dropping blocks from memory to disk is executed inside a 
synchronized block on MemoryManager (see 
org.apache.spark.storage.memory.MemoryStore#evictBlocksToFreeSpace). Heartbeats 
include memory metrics that are retrieved from MemoryManager using synchronized 
methods. When blocks are being dropped, heartbeats cannot be sent as 
Heartbeater is blocks. If dropping blocks takes longer than the network 
timeout, heartbeats are considered lost and the executor gets killed by the 
driver.)

> Dropping blocks from memory to disk may result in heartbeat loss
> ----------------------------------------------------------------
>
>                 Key: SPARK-43978
>                 URL: https://issues.apache.org/jira/browse/SPARK-43978
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.3.2
>            Reporter: Igor Konev
>            Priority: Major
>
> Dropping blocks from memory to disk is executed inside a synchronized block 
> on MemoryManager (see 
> org.apache.spark.storage.memory.MemoryStore#evictBlocksToFreeSpace). 
> Heartbeats include memory metrics that are retrieved from MemoryManager using 
> synchronized methods. When blocks are being dropped, heartbeats cannot be 
> sent as Heartbeater is blocked. If dropping blocks takes longer than the 
> network timeout, heartbeats are considered lost and the executor gets killed 
> by the driver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to