Github user kevinyu98 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13506#discussion_r65799162
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1441,6 +1441,32 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
       }
     
       /**
    +   * Delete a file to be downloaded with this Spark job on every node.
    +   * The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
    +   * filesystems), or an HTTP, HTTPS or FTP URI.  To access the file in 
Spark jobs,
    +   * use `SparkFiles.get(fileName)` to find its download location.
    +   *
    +   */
    +  def deleteFile(path: String): Unit = {
    --- End diff --
    
    Hi Reynold: Thanks very much for reviewing the code. 
    yes, it is deleting the path from the addedFile hashmap, the path will be 
generated as key and stored in the map. 
    The addFile use this logical to generate the key and stored in the hashmap, 
so in order to find the same key, I have to use the same logical to generate 
the key. 
    For example:
    for this local file, the addFile will generate a 'file' in front of the 
path.
    
    spark.sql("add file /Users/qianyangyu/myfile.txt")
    
    scala> spark.sql("list file").show(false)
    +----------------------------------+
    |Results                           |
    +----------------------------------+
    |file:/Users/qianyangyu/myfile2.txt|
    |file:/Users/qianyangyu/myfile.txt |
    +----------------------------------+
    
    but for the remote location file, it will just take the path.
    
    scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt")
    res17: org.apache.spark.sql.DataFrame = []
    
    scala> spark.sql("list file").show(false)
    +---------------------------------------------+
    |Results                                      |
    +---------------------------------------------+
    |file:/Users/qianyangyu/myfile.txt            |
    |hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt|
    +---------------------------------------------+
    
    if the command is issued from the worker node and add local file, the path 
will be added into the NettyStreamManager's hashmap and using that 
environment's path as key to store in the addedFiles. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to