[ 
https://issues.apache.org/jira/browse/SPARK-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791960#comment-16791960
 ] 

Shixiong Zhu commented on SPARK-25449:
--------------------------------------

I think this patch actually fixed a bug introduced by 
https://github.com/apache/spark/commit/0514e8d4b69615ba8918649e7e3c46b5713b6540 
It didn't use the correct default timeout. Before this batch, using 
`spark.executor.heartbeatInterval 30` would send a heartbeat every 30 ms, but 
each heartbeat RPC message timeout was 30 seconds.

This patch just unifies the default time unit in all usages of 
"spark.executor.heartbeatInterval".

> Don't send zero accumulators in heartbeats
> ------------------------------------------
>
>                 Key: SPARK-25449
>                 URL: https://issues.apache.org/jira/browse/SPARK-25449
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Mukul Murthy
>            Assignee: Mukul Murthy
>            Priority: Major
>              Labels: release-notes
>             Fix For: 3.0.0
>
>
> Heartbeats sent from executors to the driver every 10 seconds contain metrics 
> and are generally on the order of a few KBs. However, for large jobs with 
> lots of tasks, heartbeats can be on the order of tens of MBs, causing tasks 
> to die with heartbeat failures. We can mitigate this by not sending zero 
> metrics to the driver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to