[ https://issues.apache.org/jira/browse/SPARK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell resolved SPARK-3612. ------------------------------------ Resolution: Fixed Fix Version/s: 1.2.0 1.1.1 Assignee: Sandy Ryza https://github.com/apache/spark/pull/2487 > Executor shouldn't quit if heartbeat message fails to reach the driver > ---------------------------------------------------------------------- > > Key: SPARK-3612 > URL: https://issues.apache.org/jira/browse/SPARK-3612 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Reynold Xin > Assignee: Sandy Ryza > Fix For: 1.1.1, 1.2.0 > > > The thread started by Executor.startDriverHeartbeater can actually terminate > the whole executor if AkkaUtils.askWithReply[HeartbeatResponse] throws an > exception. > I don't think we should quit the executor this way. At the very least, we > would want to log a more meaningful exception then simply > {code} > 14/09/20 06:38:12 WARN AkkaUtils: Error sending message in 1 attempts > java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:379) > 14/09/20 06:38:45 WARN AkkaUtils: Error sending message in 2 attempts > java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:379) > 14/09/20 06:39:18 WARN AkkaUtils: Error sending message in 3 attempts > java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:379) > 14/09/20 06:39:21 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception > in thread Thread[Driver Heartbeater,5,main] > org.apache.spark.SparkException: Error sending message [message = > Heartbeat(281,[Lscala.Tuple2;@4d9294db,BlockManagerId(281, > ip-172-31-7-55.eu-west-1.compute.internal, 52303))] > at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:190) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:379) > Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 > seconds] > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) > ... 1 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org