HyukjinKwon commented on a change in pull request #30486: URL: https://github.com/apache/spark/pull/30486#discussion_r530279879
########## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ########## @@ -1568,21 +1612,39 @@ class SparkContext(config: SparkConf) extends Logging { val key = if (!isLocal && scheme == "file") { env.rpcEnv.fileServer.addFile(new File(uri.getPath)) + } else if (uri.getScheme == null) { + schemeCorrectedURI.toString + } else if (isArchive) { + uri.toString } else { - if (uri.getScheme == null) { - schemeCorrectedURI.toString - } else { - path - } + path } + val timestamp = if (addedOnSubmit) startTime else System.currentTimeMillis - if (addedFiles.putIfAbsent(key, timestamp).isEmpty) { + if (!isArchive && addedFiles.putIfAbsent(key, timestamp).isEmpty) { logInfo(s"Added file $path at $key with timestamp $timestamp") // Fetch the file locally so that closures which are run on the driver can still use the // SparkFiles API to access files. Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, env.securityManager, hadoopConfiguration, timestamp, useCache = false) postEnvironmentUpdate() + } else if ( + isArchive && + addedArchives.putIfAbsent( + UriBuilder.fromUri(new URI(key)).fragment(uri.getFragment).build().toString, + timestamp).isEmpty) { + logInfo(s"Added archive $path at $key with timestamp $timestamp") + val uriToDownload = UriBuilder.fromUri(new URI(key)).fragment(null).build() + val source = Utils.fetchFile(uriToDownload.toString, Utils.createTempDir(), conf, Review comment: In fact, `--files` describes same: ``` | --files FILES Comma-separated list of files to be placed in the working | directory of each executor. File paths of these files | in executors can be accessed via SparkFiles.get(fileName). ``` I think we should fix the docs to say it's also available on the driver side :-). but I would like to run it separately. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org