mbutrovich commented on code in PR #4728:
URL: https://github.com/apache/datafusion-comet/pull/4728#discussion_r3484640762
##########
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala:
##########
@@ -767,8 +767,14 @@ object QueryPlanSerde extends Logging with CometExprShim
with CometTypeShim {
}
handler.getSupportLevel(expr) match {
case Unsupported(notes) =>
- withFallbackReason(expr, notes.getOrElse(""))
- None
+ // `CodegenDispatchFallback` serdes have no native path for these
cases either, but the
+ // dispatcher can still run Spark's own `doGenCode` inside the Comet
pipeline. Try that
+ // before falling the projection back to Spark. No `[COMET-INFO]`
hint here: unlike
Review Comment:
The comment explaining why there is no `[COMET-INFO]` hint in the
`Unsupported` arm is helpful. Did you consider a debug-level log when an
`Unsupported` case routes through dispatch? Without the opt-in flag there is no
plan hint, so anyone debugging why a projection stayed native has nothing to go
on. Not essential, just a thought.
##########
spark/src/main/scala/org/apache/comet/serde/CometExpressionSerde.scala:
##########
@@ -101,14 +105,22 @@ trait CometExpressionSerde[T <: Expression] {
}
/**
- * Opt-in marker for expression serdes that have a native implementation which
is `Incompatible`
- * with Spark for some inputs. When such an expression reports `Incompatible`
and the user has not
- * enabled `allowIncompatible` for it, mixing in this trait routes it through
the JVM codegen
- * dispatcher (running Spark's own `doGenCode` inside the Comet pipeline)
instead of falling the
- * projection back to Spark, so it stays native while still matching Spark
exactly.
+ * Mixin for serdes whose native implementation cannot match Spark for some
inputs. When
+ * `getSupportLevel` returns `Incompatible` and the user has not enabled
`allowIncompatible`, the
+ * expression routes through the JVM codegen dispatcher (Spark's own
`doGenCode` inside the Comet
+ * pipeline) instead of falling the projection back to Spark. When
`getSupportLevel` returns
+ * `Unsupported`, the expression always routes through the dispatcher -- the
serde is declaring
+ * "no native path exists for this case; run Spark's code in-pipeline." Spark
fallback is reserved
+ * for the case where the dispatcher itself cannot handle the expression (e.g.
the global codegen
+ * flag is off, or the kernel rejects the bound tree).
+ *
+ * Contract for `Unsupported` reasons on a `CodegenDispatchFallback` serde:
the case must be
Review Comment:
The expanded contract reads well. Worth noting for future authors that it is
essentially always satisfiable: any case that reaches a serde already passed
Spark analysis, so Spark supports it and `doGenCode` can compile it. The only
way dispatch declines is a kernel limitation in `canHandle`, which falls back
cleanly. No change needed, just confirming the reasoning holds.
##########
spark/src/test/resources/sql-tests/expressions/array/array_concat.sql:
##########
@@ -24,17 +24,20 @@ CREATE TABLE test_array_concat(c1 array<int>, c2
array<int>, c3 array<int>, c4 a
statement
INSERT INTO test_array_concat VALUES (array(0, 1), array(2, 3), array(),
array(null), null), (array(1, 2), array(3, 4), array(), array(null), null),
(array(2, 3), array(4, 5), array(), array(null), null)
-query expect_fallback(CONCAT supports only string input parameters)
+-- Concat mixes in CodegenDispatchFallback, so non-string (array) children
have no native path
+-- but route through the JVM codegen dispatcher (Spark's own Concat.doGenCode
inside the Comet
+-- pipeline) and stay native while matching Spark exactly.
+query
SELECT concat(c1, c2) AS x FROM test_array_concat
-query expect_fallback(CONCAT supports only string input parameters)
Review Comment:
Switching these from `expect_fallback` to plain `query` is the right call
now that they stay native, and the same applies to the other four files. One
side effect is that the removed `expect_fallback` assertions were the only
tests pinning the disabled-dispatcher fallback for these cases. Would it be
worth one test with `spark.comet.exec.scalaUDF.codegen.enabled=false`
confirming these `Unsupported` cases still fall back to Spark cleanly? That
locks in the safety net the description mentions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]