Re: [I] Comet writer doesn't respect `OVERWRITE` mode [datafusion-comet]

via GitHub Mon, 29 Dec 2025 13:38:23 -0800


comphead commented on issue #2970:
URL: 
https://github.com/apache/datafusion-comet/issues/2970#issuecomment-3697596047


   some more details on implementation
   
   The flow is like
   
   ```User calls: df.write.mode("overwrite").save(path)
       ↓
   DataFrameWriter.saveInternal() 
       ↓
   InsertIntoHadoopFsRelationCommand.run()
       ↓
   [Line 131] deleteMatchingPartitions(fs, qualifiedOutputPath, 
customPartitionLocations, committer)
       ↓
   [Line 238] committer.deleteWithJob(fs, staticPrefixPath, true)
       ↓
   [Line 183] fs.delete(path, recursive=true)  ← ACTUAL DELETION HAPPENS HERE
   ```
   This Spark implementation now
   ```
       val doInsertion = if (mode == SaveMode.Append) {
         true
       } else {
         val pathExists = fs.exists(qualifiedOutputPath)
         (mode, pathExists) match {
           case (SaveMode.ErrorIfExists, true) =>
             throw 
QueryCompilationErrors.outputPathAlreadyExistsError(qualifiedOutputPath)
           case (SaveMode.Overwrite, true) =>
             if (ifPartitionNotExists && matchingPartitions.nonEmpty) {
               false
             } else if (dynamicPartitionOverwrite) {
               // For dynamic partition overwrite, do not delete partition 
directories ahead.
               true
             } else {
               deleteMatchingPartitions(fs, qualifiedOutputPath, 
customPartitionLocations, committer)
               true
             }
           case (SaveMode.Overwrite, _) | (SaveMode.ErrorIfExists, false) =>
             true
           case (SaveMode.Ignore, exists) =>
             !exists
           case (s, exists) =>
             throw QueryExecutionErrors.saveModeUnsupportedError(s, exists)
         }
       }
   ```
   
   The keypoint here, we need to support `dynamicPartitionOverwrite` or 
fallback to Spark in this case


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Comet writer doesn't respect `OVERWRITE` mode [datafusion-comet]

Reply via email to