[ https://issues.apache.org/jira/browse/SPARK-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253779#comment-14253779 ]
Apache Spark commented on SPARK-4896: ------------------------------------- User 'ryan-williams' has created a pull request for this issue: https://github.com/apache/spark/pull/2848 > Don't redundantly copy executor dependencies in Utils.fetchFile > --------------------------------------------------------------- > > Key: SPARK-4896 > URL: https://issues.apache.org/jira/browse/SPARK-4896 > Project: Spark > Issue Type: Improvement > Reporter: Josh Rosen > > This JIRA is spun off from a comment by [~rdub] on SPARK-3967, quoted here: > {quote} > I've been debugging this issue as well and I think I've found an issue in > {{org.apache.spark.util.Utils}} that is contributing to / causing the problem: > {{Files.move}} on [line > 390|https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/util/Utils.scala#L390] > is called even if {{targetFile}} exists and {{tempFile}} and {{targetFile}} > are equal. > The check on [line > 379|https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/util/Utils.scala#L379] > seems to imply the desire to skip a redundant overwrite if the file is > already there and has the contents that it should have. > Gating the {{Files.move}} call on a further {{if (!targetFile.exists)}} fixes > the issue for me; attached is a patch of the change. > In practice all of my executors that hit this code path are finding every > dependency JAR to already exist and be exactly equal to what they need it to > be, meaning they were all needlessly overwriting all of their dependency > JARs, and now are all basically no-op-ing in {{Utils.fetchFile}}; I've not > determined who/what is putting the JARs there, why the issue only crops up in > {{yarn-cluster}} mode (or {{--master yarn --deploy-mode cluster}}), etc., but > it seems like either way this patch is probably desirable. > {quote} > I'm spinning this off into its own JIRA so that we can track the merging of > https://github.com/apache/spark/pull/2848 separately (since we have multiple > PRs that contribute to fixing the original issue). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org