[jira] [Commented] (SPARK-25449) Don't send zero accumulators in heartbeats
[ https://issues.apache.org/jira/browse/SPARK-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791960#comment-16791960 ] Shixiong Zhu commented on SPARK-25449: -- I think this patch actually fixed a bug introduced by https://github.com/apache/spark/commit/0514e8d4b69615ba8918649e7e3c46b5713b6540 It didn't use the correct default timeout. Before this batch, using `spark.executor.heartbeatInterval 30` would send a heartbeat every 30 ms, but each heartbeat RPC message timeout was 30 seconds. This patch just unifies the default time unit in all usages of "spark.executor.heartbeatInterval". > Don't send zero accumulators in heartbeats > -- > > Key: SPARK-25449 > URL: https://issues.apache.org/jira/browse/SPARK-25449 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Mukul Murthy >Assignee: Mukul Murthy >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Heartbeats sent from executors to the driver every 10 seconds contain metrics > and are generally on the order of a few KBs. However, for large jobs with > lots of tasks, heartbeats can be on the order of tens of MBs, causing tasks > to die with heartbeat failures. We can mitigate this by not sending zero > metrics to the driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25449) Don't send zero accumulators in heartbeats
[ https://issues.apache.org/jira/browse/SPARK-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791321#comment-16791321 ] Xiao Li commented on SPARK-25449: - This changed the unit of conf. > Don't send zero accumulators in heartbeats > -- > > Key: SPARK-25449 > URL: https://issues.apache.org/jira/browse/SPARK-25449 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Mukul Murthy >Assignee: Mukul Murthy >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Heartbeats sent from executors to the driver every 10 seconds contain metrics > and are generally on the order of a few KBs. However, for large jobs with > lots of tasks, heartbeats can be on the order of tens of MBs, causing tasks > to die with heartbeat failures. We can mitigate this by not sending zero > metrics to the driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25449) Don't send zero accumulators in heartbeats
[ https://issues.apache.org/jira/browse/SPARK-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623876#comment-16623876 ] Apache Spark commented on SPARK-25449: -- User 'mukulmurthy' has created a pull request for this issue: https://github.com/apache/spark/pull/22473 > Don't send zero accumulators in heartbeats > -- > > Key: SPARK-25449 > URL: https://issues.apache.org/jira/browse/SPARK-25449 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Mukul Murthy >Priority: Major > > Heartbeats sent from executors to the driver every 10 seconds contain metrics > and are generally on the order of a few KBs. However, for large jobs with > lots of tasks, heartbeats can be on the order of tens of MBs, causing tasks > to die with heartbeat failures. We can mitigate this by not sending zero > metrics to the driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org