[GitHub] spark pull request #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwr...

zheh12 Mon, 07 May 2018 19:34:14 -0700

Github user zheh12 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21257#discussion_r186603145
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
    @@ -207,9 +207,25 @@ case class InsertIntoHadoopFsRelationCommand(
         }
         // first clear the path determined by the static partition keys (e.g. 
/table/foo=1)
         val staticPrefixPath = 
qualifiedOutputPath.suffix(staticPartitionPrefix)
    -    if (fs.exists(staticPrefixPath) && !committer.deleteWithJob(fs, 
staticPrefixPath, true)) {
    -      throw new IOException(s"Unable to clear output " +
    -        s"directory $staticPrefixPath prior to writing to it")
    +
    +    // check if delete the dir or just sub files
    +    if (fs.exists(staticPrefixPath)) {
    +      // check if is he table root, and record the file to delete
    +      if (staticPartitionPrefix.isEmpty) {
    +        val files = fs.listFiles(staticPrefixPath, false)
    +        while (files.hasNext) {
    +          val file = files.next()
    +          if (!committer.deleteWithJob(fs, file.getPath, true)) {
    --- End diff --
    
    1. From the code point of view, the current implementation is 
`deleteMatchingPartitions` happend only if `overwrite` is specified.
    2. Using `dynamicPartitionOverwrite` will not solve this problem,because it 
will also generate a `.stage` directory under the table root directory. We 
still need to record all the files we want to delete, but we cannot directly 
delete the root directories.
    The dynamic partition overwrite is actually recording all the partitions 
that need to be deleted and then deleted one by one. And the entire table 
`overwrite` deletes all the data of the entire directory, it needs to record 
all deleted partition directory files,so in fact the implementation of the code 
is similar with `dynamicPartitionOverwrite` .



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwr...

Reply via email to