Re: [PR] feat: route Unsupported through codegen dispatch for opt-in serdes [datafusion-comet]

via GitHub Fri, 26 Jun 2026 16:54:57 -0700


mbutrovich commented on code in PR #4728:
URL: https://github.com/apache/datafusion-comet/pull/4728#discussion_r3484640762



##########
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala:
##########
@@ -767,8 +767,14 @@ object QueryPlanSerde extends Logging with CometExprShim 
with CometTypeShim {
       }
       handler.getSupportLevel(expr) match {
         case Unsupported(notes) =>
-          withFallbackReason(expr, notes.getOrElse(""))
-          None
+          // `CodegenDispatchFallback` serdes have no native path for these 
cases either, but the
+          // dispatcher can still run Spark's own `doGenCode` inside the Comet 
pipeline. Try that
+          // before falling the projection back to Spark. No `[COMET-INFO]` 
hint here: unlike

Review Comment:
   The comment explaining why there is no `[COMET-INFO]` hint in the 
`Unsupported` arm is helpful. Did you consider a debug-level log when an 
`Unsupported` case routes through dispatch? Without the opt-in flag there is no 
plan hint, so anyone debugging why a projection stayed native has nothing to go 
on. Not essential, just a thought.



##########
spark/src/main/scala/org/apache/comet/serde/CometExpressionSerde.scala:
##########
@@ -101,14 +105,22 @@ trait CometExpressionSerde[T <: Expression] {
 }
 
 /**
- * Opt-in marker for expression serdes that have a native implementation which 
is `Incompatible`
- * with Spark for some inputs. When such an expression reports `Incompatible` 
and the user has not
- * enabled `allowIncompatible` for it, mixing in this trait routes it through 
the JVM codegen
- * dispatcher (running Spark's own `doGenCode` inside the Comet pipeline) 
instead of falling the
- * projection back to Spark, so it stays native while still matching Spark 
exactly.
+ * Mixin for serdes whose native implementation cannot match Spark for some 
inputs. When
+ * `getSupportLevel` returns `Incompatible` and the user has not enabled 
`allowIncompatible`, the
+ * expression routes through the JVM codegen dispatcher (Spark's own 
`doGenCode` inside the Comet
+ * pipeline) instead of falling the projection back to Spark. When 
`getSupportLevel` returns
+ * `Unsupported`, the expression always routes through the dispatcher -- the 
serde is declaring
+ * "no native path exists for this case; run Spark's code in-pipeline." Spark 
fallback is reserved
+ * for the case where the dispatcher itself cannot handle the expression (e.g. 
the global codegen
+ * flag is off, or the kernel rejects the bound tree).
+ *
+ * Contract for `Unsupported` reasons on a `CodegenDispatchFallback` serde: 
the case must be

Review Comment:
   The expanded contract reads well. Worth noting for future authors that it is 
essentially always satisfiable: any case that reaches a serde already passed 
Spark analysis, so Spark supports it and `doGenCode` can compile it. The only 
way dispatch declines is a kernel limitation in `canHandle`, which falls back 
cleanly. No change needed, just confirming the reasoning holds.



##########
spark/src/test/resources/sql-tests/expressions/array/array_concat.sql:
##########
@@ -24,17 +24,20 @@ CREATE TABLE test_array_concat(c1 array<int>, c2 
array<int>, c3 array<int>, c4 a
 statement
 INSERT INTO test_array_concat VALUES (array(0, 1), array(2, 3), array(), 
array(null), null), (array(1, 2), array(3, 4), array(), array(null), null), 
(array(2, 3), array(4, 5), array(), array(null), null)
 
-query expect_fallback(CONCAT supports only string input parameters)
+-- Concat mixes in CodegenDispatchFallback, so non-string (array) children 
have no native path
+-- but route through the JVM codegen dispatcher (Spark's own Concat.doGenCode 
inside the Comet
+-- pipeline) and stay native while matching Spark exactly.
+query
 SELECT concat(c1, c2) AS x FROM test_array_concat
 
-query expect_fallback(CONCAT supports only string input parameters)

Review Comment:
   Switching these from `expect_fallback` to plain `query` is the right call 
now that they stay native, and the same applies to the other four files. One 
side effect is that the removed `expect_fallback` assertions were the only 
tests pinning the disabled-dispatcher fallback for these cases. Would it be 
worth one test with `spark.comet.exec.scalaUDF.codegen.enabled=false` 
confirming these `Unsupported` cases still fall back to Spark cleanly? That 
locks in the safety net the description mentions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: route Unsupported through codegen dispatch for opt-in serdes [datafusion-comet]

Reply via email to