[ https://issues.apache.org/jira/browse/SPARK-29276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944526#comment-16944526 ]
Jochen Hebbrecht commented on SPARK-29276: ------------------------------------------ Thanks, I've just send out a mail on the mailing list :-) > Spark job fails because of timeout to Driver > -------------------------------------------- > > Key: SPARK-29276 > URL: https://issues.apache.org/jira/browse/SPARK-29276 > Project: Spark > Issue Type: Question > Components: Spark Core > Affects Versions: 2.4.2 > Reporter: Jochen Hebbrecht > Priority: Major > > Hi, > I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job > towards the cluster. Thhe job gets accepted, but the YARN application fails > with: > {code} > 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception: > java.util.concurrent.TimeoutException: Futures timed out after [100000 > milliseconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220) > at > org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468) > at > org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) > at > org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) > at > org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244) > at > org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) > at > org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) > 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED, exitCode: > 13, (reason: Uncaught exception: java.util.concurrent.TimeoutException: > Futures timed out after [100000 milliseconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220) > at > org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468) > at > org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) > at > org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) > at > org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244) > at > org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) > at > org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) > {code} > It actually goes wrong at this line: > https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468 > Now, I'm 100% sure Spark is OK and there's no bug, but there must be > something wrong with my setup. I don't understand the code of the > ApplicationMaster, so could somebody explain me what it is trying to reach? > Where exactly does the connection timeout? So at least I can debug it further > because I don't have a clue what it is doing :-) > Thanks for any help! > Jochen -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org