Github user zheh12 commented on a diff in the pull request: https://github.com/apache/spark/pull/21257#discussion_r186603145 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -207,9 +207,25 @@ case class InsertIntoHadoopFsRelationCommand( } // first clear the path determined by the static partition keys (e.g. /table/foo=1) val staticPrefixPath = qualifiedOutputPath.suffix(staticPartitionPrefix) - if (fs.exists(staticPrefixPath) && !committer.deleteWithJob(fs, staticPrefixPath, true)) { - throw new IOException(s"Unable to clear output " + - s"directory $staticPrefixPath prior to writing to it") + + // check if delete the dir or just sub files + if (fs.exists(staticPrefixPath)) { + // check if is he table root, and record the file to delete + if (staticPartitionPrefix.isEmpty) { + val files = fs.listFiles(staticPrefixPath, false) + while (files.hasNext) { + val file = files.next() + if (!committer.deleteWithJob(fs, file.getPath, true)) { --- End diff -- 1. From the code point of view, the current implementation is `deleteMatchingPartitions` happend only if `overwrite` is specified. 2. Using `dynamicPartitionOverwrite` will not solve this problem,because it will also generate a `.stage` directory under the table root directory. We still need to record all the files we want to delete, but we cannot directly delete the root directories. The dynamic partition overwrite is actually recording all the partitions that need to be deleted and then deleted one by one. And the entire table `overwrite` deletes all the data of the entire directory, it needs to record all deleted partition directory files,so in fact the implementation of the code is similar with `dynamicPartitionOverwrite` .
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org