Re: [PR] docs: Update contributor guide for adding a new expression [datafusion-comet]

via GitHub Thu, 06 Nov 2025 10:27:50 -0800


andygrove commented on code in PR #2704:
URL: https://github.com/apache/datafusion-comet/pull/2704#discussion_r2500292927



##########
docs/source/contributor-guide/adding_a_new_expression.md:
##########
@@ -41,26 +41,172 @@ Once you know what you want to add, you'll need to update 
the query planner to r
 
 ### Adding the Expression in Scala
 
-The `QueryPlanSerde` object has a method `exprToProto`, which is responsible 
for converting a Spark expression to a protobuf expression. Within that method 
is an `exprToProtoInternal` method that contains a large match statement for 
each expression type. You'll need to add a new case to this match statement for 
your new expression.
+DataFusion Comet uses a framework based on the `CometExpressionSerde` trait 
for converting Spark expressions to protobuf. Instead of a large match 
statement, each expression type has its own serialization handler. For 
aggregate expressions, use the `CometAggregateExpressionSerde` trait instead.
+
+#### Creating a CometExpressionSerde Implementation
+
+First, create an object that extends `CometExpressionSerde[T]` where `T` is 
the Spark expression type. This is typically added to one of the serde files in 
`spark/src/main/scala/org/apache/comet/serde/` (e.g., `math.scala`, 
`strings.scala`, `arrays.scala`, etc.).
 
 For example, the `unhex` function looks like this:
 
 ```scala
-case e: Unhex =>
-  val unHex = unhexSerde(e)
+object CometUnhex extends CometExpressionSerde[Unhex] {
+  override def convert(
+      expr: Unhex,
+      inputs: Seq[Attribute],
+      binding: Boolean): Option[ExprOuterClass.Expr] = {
+    val childExpr = exprToProtoInternal(expr.child, inputs, binding)
+    val failOnErrorExpr = exprToProtoInternal(Literal(expr.failOnError), 
inputs, binding)
+
+    val optExpr =
+      scalarFunctionExprToProtoWithReturnType(
+        "unhex",
+        expr.dataType,
+        false,
+        childExpr,
+        failOnErrorExpr)
+    optExprWithInfo(optExpr, expr, expr.child)
+  }
+}
+```
+
+The `CometExpressionSerde` trait provides three methods you can override:
+
+* `convert(expr: T, inputs: Seq[Attribute], binding: Boolean): Option[Expr]` - 
**Required**. Converts the Spark expression to protobuf. Return `None` if the 
expression cannot be converted.
+* `getSupportLevel(expr: T): SupportLevel` - Optional. Returns the level of 
support for the expression. See "Using getSupportLevel" section below for 
details.
+* `getExprConfigName(expr: T): String` - Optional. Returns a short name for 
configuration keys. Defaults to the Spark class name.
+
+For simple scalar functions that map directly to a DataFusion function, you 
can use the built-in `CometScalarFunction` implementation:
+
+```scala
+classOf[Cos] -> CometScalarFunction("cos")
+```
+
+#### Registering the Expression Handler
+
+Once you've created your `CometExpressionSerde` implementation, register it in 
`QueryPlanSerde.scala` by adding it to the appropriate expression map (e.g., 
`mathExpressions`, `stringExpressions`, `predicateExpressions`, etc.):
+
+```scala
+private val mathExpressions: Map[Class[_ <: Expression], 
CometExpressionSerde[_]] = Map(
+  // ... other expressions ...
+  classOf[Unhex] -> CometUnhex,
+  classOf[Hex] -> CometHex)
+```
+
+The `exprToProtoInternal` method will automatically use this mapping to find 
and invoke your handler when it encounters the corresponding Spark expression 
type.
+
+A few things to note:
+
+* The `convert` method is recursively called on child expressions using 
`exprToProtoInternal`, so you'll need to make sure that the child expressions 
are also converted to protobuf.
+* `scalarFunctionExprToProtoWithReturnType` is for scalar functions that need 
to return type information. Your expression may use a different method 
depending on the type of expression.
+* Use helper methods like `createBinaryExpr` and `createUnaryExpr` from 
`QueryPlanSerde` for common expression patterns.
+
+#### Using getSupportLevel
+
+The `getSupportLevel` method allows you to control whether an expression 
should be executed by Comet based on various conditions such as data types, 
parameter values, or other expression-specific constraints. This is 
particularly useful when:
+
+1. Your expression only supports specific data types
+2. Your expression has known incompatibilities with Spark's behavior
+3. Your expression has edge cases that aren't yet supported
+
+The method returns one of three `SupportLevel` values:
+
+* **`Compatible(notes: Option[String] = None)`** - Comet supports this 
expression with full compatibility with Spark, or may have known differences in 
specific edge cases that are unlikely to be an issue for most users. This is 
the default if you don't override `getSupportLevel`.
+* **`Incompatible(notes: Option[String] = None)`** - Comet supports this 
expression but results can be different from Spark. The expression will only be 
used if `spark.comet.expr.allowIncompatible=true` or the expression-specific 
config `spark.comet.expr.<exprName>.allowIncompatible=true` is set.
+* **`Unsupported(notes: Option[String] = None)`** - Comet does not support 
this expression under the current conditions. The expression will not be used 
and Spark will fall back to its native execution.
+
+All three support levels accept an optional `notes` parameter to provide 
additional context about the support level.
+
+##### Examples

Review Comment:
   I think we need to keep the indentation level because this is a subsection 
of `#### Using getSupportLevel`, which is using 4 `#`.
   
   The rendered output for the documentation site seems ok:
   
   <img width="621" height="232" alt="render" 
src="https://github.com/user-attachments/assets/020b4765-8987-4331-955c-891eb08b408f";
 />
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: Update contributor guide for adding a new expression [datafusion-comet]

Reply via email to