Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/13506#discussion_r65799162 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1441,6 +1441,32 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli } /** + * Delete a file to be downloaded with this Spark job on every node. + * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported + * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, + * use `SparkFiles.get(fileName)` to find its download location. + * + */ + def deleteFile(path: String): Unit = { --- End diff -- Hi Reynold: Thanks very much for reviewing the code. yes, it is deleting the path from the addedFile hashmap, the path will be generated as key and stored in the map. The addFile use this logical to generate the key and stored in the hashmap, so in order to find the same key, I have to use the same logical to generate the key. For example: for this local file, the addFile will generate a 'file' in front of the path. spark.sql("add file /Users/qianyangyu/myfile.txt") scala> spark.sql("list file").show(false) +----------------------------------+ |Results | +----------------------------------+ |file:/Users/qianyangyu/myfile2.txt| |file:/Users/qianyangyu/myfile.txt | +----------------------------------+ but for the remote location file, it will just take the path. scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt") res17: org.apache.spark.sql.DataFrame = [] scala> spark.sql("list file").show(false) +---------------------------------------------+ |Results | +---------------------------------------------+ |file:/Users/qianyangyu/myfile.txt | |hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt| +---------------------------------------------+ if the command is issued from the worker node and add local file, the path will be added into the NettyStreamManager's hashmap and using that environment's path as key to store in the addedFiles.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org