[ https://issues.apache.org/jira/browse/SPARK-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102960#comment-14102960 ]
Josh Rosen commented on SPARK-3139: ----------------------------------- I used pssh + grep to search through the application logs on the workers and I couldn't find any ERRORs or Exceptions (I'm sure that I was searching the right log directories, since other searches return matches). > Akka timeouts from ContextCleaner when cleaning shuffles > -------------------------------------------------------- > > Key: SPARK-3139 > URL: https://issues.apache.org/jira/browse/SPARK-3139 > Project: Spark > Issue Type: Bug > Affects Versions: 1.1.0 > Environment: 10 r3.2xlarge tests on EC2, running the > scala-agg-by-key-int spark-perf test against master commit > d7e80c2597d4a9cae2e0cb35a86f7889323f4cbb. > Reporter: Josh Rosen > Priority: Blocker > > When running spark-perf tests on EC2, I have a job that's consistently > logging the following Akka exceptions: > {code} > 4/08/19 22:07:12 ERROR spark.ContextCleaner: Error cleaning shuffle 0 > java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at > org.apache.spark.storage.BlockManagerMaster.removeShuffle(BlockManagerMaster.scala:118) > at > org.apache.spark.ContextCleaner.doCleanupShuffle(ContextCleaner.scala:159) > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:131) > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:124) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:124) > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:120) > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:120) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1252) > at > org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:119) > at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65) > {code} > and > {code} > 14/08/19 22:07:12 ERROR storage.BlockManagerMaster: Failed to remove shuffle 0 > akka.pattern.AskTimeoutException: Timed out > at > akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) > at akka.actor.Scheduler$$anon$11.run(Scheduler.scala:118) > at > scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) > at > akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:455) > at > akka.actor.LightArrayRevolverScheduler$$anon$12.executeBucket$1(Scheduler.scala:407) > at > akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:411) > at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363) > at java.lang.Thread.run(Thread.java:745) > {code} > This doesn't seem to prevent the job from completing successfully, but it's > serious issue because it means that resources aren't being cleaned up. The > test script, ScalaAggByKeyInt, runs each test 10 times, and I see the same > error after each test, so this seems deterministically reproducible. > I'll look at the executor logs to see if I can find more info there. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org