[ https://issues.apache.org/jira/browse/SPARK-22754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcelo Vanzin resolved SPARK-22754. ------------------------------------ Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19942 [https://github.com/apache/spark/pull/19942] > Check spark.executor.heartbeatInterval setting in case of ExecutorLost > ---------------------------------------------------------------------- > > Key: SPARK-22754 > URL: https://issues.apache.org/jira/browse/SPARK-22754 > Project: Spark > Issue Type: Improvement > Components: Deploy > Affects Versions: 2.1.0 > Reporter: zhoukang > Priority: Minor > Fix For: 2.3.0 > > > If spark.executor.heartbeatInterval bigger than spark.network.timeout,it will > almost always cause exception below. > {code:java} > Job aborted due to stage failure: Task 4763 in stage 3.0 failed 4 times, most > recent failure: Lost task 4763.3 in stage 3.0 (TID 22383, executor id: 4761, > host: xxx): ExecutorLostFailure (executor 4761 exited caused by one of the > running tasks) Reason: Executor heartbeat timed out after 154022 ms > {code} > Since many users do not get that point.He will set > spark.executor.heartbeatInterval incorrectly. > We should check this case when submit applications. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org