andygrove opened a new pull request, #4286:
URL: https://github.com/apache/datafusion-comet/pull/4286

   ## Which issue does this PR close?
   
   Closes #.
   
   ## Rationale for this change
   
   `substring_index` is a commonly used Spark string function that was not yet 
supported natively by Comet, causing fallback to Spark execution.
   
   ## What changes are included in this PR?
   
   Adds native support for the `substring_index(str, delim, count)` expression 
by delegating to DataFusion's built-in `substr_index` function (aliased as 
`substring_index`), which has identical semantics to Spark. The only adaptation 
needed is casting the `count` argument from `IntegerType` to `LongType` to 
match DataFusion's function signature.
   
   Changes:
   - Added `CometSubstringIndex` serde in `strings.scala`
   - Registered in `QueryPlanSerde.stringExpressions` map
   - Added comprehensive Comet SQL Test covering column/literal arguments, NULL 
propagation, empty strings, multi-character delimiters, multibyte UTF-8, 
boundary delimiters, large count values, and dictionary encoding via 
ConfigMatrix
   - Marked `substring_index` as supported in the expressions support doc
   
   The `implement-comet-expression` skill was used to scaffold this 
implementation.
   
   ## How are these changes tested?
   
   Comet SQL Test at 
`spark/src/test/resources/sql-tests/expressions/string/substring_index.sql` 
with `ConfigMatrix: parquet.enable.dictionary=false,true` (2 test 
configurations). Covers:
   - All-column, all-literal, and mixed column/literal argument combinations
   - NULL in each argument position
   - Empty string and empty delimiter
   - Positive, negative, and zero count
   - Count exceeding number of delimiters
   - Multi-character delimiters
   - Delimiter not found in string
   - Multibyte UTF-8 characters (Chinese)
   - Delimiter at start/end of string
   - Delimiter equal to the full string
   - Large count values (INT_MAX, -INT_MAX)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to