comphead commented on issue #2970:
URL:
https://github.com/apache/datafusion-comet/issues/2970#issuecomment-3697596047
some more details on implementation
The flow is like
```User calls: df.write.mode("overwrite").save(path)
↓
DataFrameWriter.saveInternal()
↓
InsertIntoHadoopFsRelationCommand.run()
↓
[Line 131] deleteMatchingPartitions(fs, qualifiedOutputPath,
customPartitionLocations, committer)
↓
[Line 238] committer.deleteWithJob(fs, staticPrefixPath, true)
↓
[Line 183] fs.delete(path, recursive=true) ← ACTUAL DELETION HAPPENS HERE
```
This Spark implementation now
```
val doInsertion = if (mode == SaveMode.Append) {
true
} else {
val pathExists = fs.exists(qualifiedOutputPath)
(mode, pathExists) match {
case (SaveMode.ErrorIfExists, true) =>
throw
QueryCompilationErrors.outputPathAlreadyExistsError(qualifiedOutputPath)
case (SaveMode.Overwrite, true) =>
if (ifPartitionNotExists && matchingPartitions.nonEmpty) {
false
} else if (dynamicPartitionOverwrite) {
// For dynamic partition overwrite, do not delete partition
directories ahead.
true
} else {
deleteMatchingPartitions(fs, qualifiedOutputPath,
customPartitionLocations, committer)
true
}
case (SaveMode.Overwrite, _) | (SaveMode.ErrorIfExists, false) =>
true
case (SaveMode.Ignore, exists) =>
!exists
case (s, exists) =>
throw QueryExecutionErrors.saveModeUnsupportedError(s, exists)
}
}
```
The keypoint here, we need to support `dynamicPartitionOverwrite` or
fallback to Spark in this case
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]