Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/15267#discussion_r81158537 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -522,7 +525,9 @@ private[yarn] class YarnAllocator( // This is usually happened when explicitly killing a container, the result will be // returned in one AM-RM communication. So query RPC will be later than this completed // container process. - releasedExecutorLossReasons.put(eid, exitReason) + if (!executorsKilledByDriver.contains(eid)) { --- End diff -- there is a race here where if the executor went down for reason other then us kill after we called kill we won't get the right loss reason. ie killExecutor just adding it to list to be removed later, if it happened to die for another reason we would miss out on that. I don't think this is a big deal but we could put in a check here to compare reason.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org