Re: [PR] fix: Support concat_ws with literal NULL separator [datafusion-comet]

via GitHub Tue, 17 Feb 2026 15:51:51 -0800


coderfender commented on PR #3542:
URL: 
https://github.com/apache/datafusion-comet/pull/3542#issuecomment-3917679967


   @0lai0 , @andygrove  . We might want to hold off onto this PR before 
merging.  There is a test failure and I am not sure we covered all possible 
`Literal` conditions in our case statement . Steps to reproduce the SQL failure 
   
   ```
     test("concat_ws test - no constant folding") {
       withSQLConf(
         "spark.sql.optimizer.excludedRules" ->
           "org.apache.spark.sql.catalyst.optimizer.ConstantFolding") {
         withParquetTable(Seq(1, 2).map(Tuple1(_)), "t") {
           val df = sql("SELECT concat_ws(',', NULL, 'b', 'c'), concat_ws(NULL, 
'a', 'b') FROM t")
           df.explain(true)
           checkSparkAnswerAndOperator(df)
         }
       }
     }
   
     ```
     
     Error (with plan) 
     
     ```
     == Parsed Logical Plan ==
   'Project [unresolvedalias('concat_ws(,, null, b, c), None), 
unresolvedalias('concat_ws(null, a, b), None)]
   +- 'UnresolvedRelation [t], [], false
   
   == Analyzed Logical Plan ==
   concat_ws(,, NULL, b, c): string, concat_ws(NULL, a, b): string
   Project [concat_ws(,, cast(null as array<string>), b, c) AS concat_ws(,, 
NULL, b, c)#5, concat_ws(cast(null as string), a, b) AS concat_ws(NULL, a, b)#6]
   +- SubqueryAlias t
      +- View (`t`, [_1#3])
         +- Relation [_1#3] parquet
   
   == Optimized Logical Plan ==
   Project [concat_ws(,, null, b, c) AS concat_ws(,, NULL, b, c)#5, 
concat_ws(null, a, b) AS concat_ws(NULL, a, b)#6]
   +- Relation [_1#3] parquet
   
   == Physical Plan ==
   *(1) CometColumnarToRow
   +- CometProject [concat_ws(,, NULL, b, c)#5, concat_ws(NULL, a, b)#6], 
[concat_ws(,, null, b, c) AS concat_ws(,, NULL, b, c)#5, concat_ws(null, a, b) 
AS concat_ws(NULL, a, b)#6]
      +- CometScan [native_iceberg_compat] parquet [] Batched: true, 
DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 
paths)[file:/private/var/folders/k0/t16s7rgj6gl2x008c266k4vm0000gn/T/spark-53...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>
   
   
   
   Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 3.0 (TID 5) (172.16.2.87 executor 
driver): org.apache.comet.CometNativeException: Expected string literal, got 
None.
   This issue was likely caused by a bug in DataFusion's code. Please help us 
to resolve this by filing a bug report in our issue tracker: 
https://github.com/apache/datafusion/issues
        at org.apache.comet.Native.executePlan(Native Method)
        at 
org.apache.comet.CometExecIterator.$anonfun$getNextBatch$2(CometExecIterator.scala:150)
        at 
org.apache.comet.CometExecIterator.$anonfun$getNextBatch$2$adapted(CometExecIterator.scala:149)
        at org.apache.comet.vector.NativeUtil.getNextBatch(NativeUtil.scala:232)
        at 
org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1(CometExecIterator.scala:149)
        at org.apache.comet.Tracing$.withTrace(Tracing.scala:31)
        at 
org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:147)
        at 
org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:203)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.cometcolumnartorow_nextBatch_0$(Unknown
 Source)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at org.apache.spark.util.Iterators$.size(Iterators.scala:29)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1953)
        at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1269)
        at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1269)
        at 
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:139)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: Support concat_ws with literal NULL separator [datafusion-comet]

Reply via email to