Re: [PR] perf: Optimize contains expression with SIMD-based scalar pattern sea… [datafusion-comet]

via GitHub Mon, 29 Dec 2025 20:19:44 -0800


Shekharrajak commented on code in PR #2991:
URL: https://github.com/apache/datafusion-comet/pull/2991#discussion_r2652181481



##########
spark/src/main/scala/org/apache/comet/serde/operator/CometDataWritingCommand.scala:
##########
@@ -50,6 +50,11 @@ object CometDataWritingCommand extends 
CometOperatorSerde[DataWritingCommandExec
   override def getSupportLevel(op: DataWritingCommandExec): SupportLevel = {
     op.cmd match {
       case cmd: InsertIntoHadoopFsRelationCommand =>
+        // Skip INSERT OVERWRITE DIRECTORY operations (catalogTable is None 
for directory writes)
+        if (cmd.catalogTable.isEmpty) {

Review Comment:
   <img width="1716" height="937" alt="Image" 
src="https://github.com/user-attachments/assets/3a4ec7ca-bb45-4cd8-a2b6-3b2d5e3b1382";
 />
   
   fix for error : 
   ```
   RROR org.apache.spark.sql.execution.command.InsertIntoDataSourceDirCommand: 
Failed to write to directory 
Some(file:/__w/datafusion-comet/datafusion-comet/apache-spark/target/tmp/spark-76b62d31-5bd6-4d4b-9770-262cb08e84f3)
   org.apache.spark.sql.AnalysisException: [COLUMN_ALREADY_EXISTS] The column 
`id` already exists. Choose another name or rename the existing column. 
SQLSTATE: 42711
        at 
org.apache.spark.sql.errors.QueryCompilationErrors$.columnAlreadyExistsError(QueryCompilationErrors.scala:2700)
        at 
org.apache.spark.sql.util.SchemaUtils$.checkColumnNameDuplication(SchemaUtils.scala:151)
        at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:86)
        at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:117)
        at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:115)
        at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:129)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$2(QueryExecution.scala:155)
   [info] - SPARK-25389 INSERT OVERWRITE LOCAL DIRECTORY ... STORED AS with 
duplicated names(caseSensitivity=true, format=orc) (22 milliseconds)
   18:44:25.173 ERROR 
org.apache.spark.sql.execution.command.InsertIntoDataSourceDirCommand: Failed 
to write to directory 
Some(file:/__w/datafusion-comet/datafusion-comet/apache-spark/target/tmp/spark-76ef391d-5d5f-4997-afb4-97ac714c1697)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: Optimize contains expression with SIMD-based scalar pattern sea… [datafusion-comet]

Reply via email to