andygrove opened a new pull request, #4461:
URL: https://github.com/apache/datafusion-comet/pull/4461

   ## Which issue does this PR close?
   
   <!-- No dedicated tracking issue; this is one of the rolling per-category 
audits. -->
   
   Closes #.
   
   ## Rationale for this change
   
   Following the same pattern as #4436 (`any`), #4437 (`bit_and`), and the 
earlier date/time audit, this PR audits every supported string expression in 
Comet against Spark 3.4.3, 3.5.8, and 4.0.1, records the findings inline in the 
support doc, and applies the support-level consistency fixes the audit surfaced.
   
   ## What changes are included in this PR?
   
   Add per-version audit sub-bullets to all 37 supported string SQL function 
names in `docs/source/contributor-guide/spark_expressions_support.md`. The 
notes record:
   
   - whether the Spark class is identical across 3.4.3 / 3.5.8 / 4.0.1
   - Spark 4.0 changes (collation widening via `StringTypeWithCollation`; 
`CollationSupport` routing for `Upper`/`Lower`/`InitCap` and the 
`startswith`/`endswith`/`contains`/`instr`/`replace`/`translate`/`trim` family; 
`NullIntolerant` -> `nullIntolerant: Boolean` refactor; `BinaryPad` rewrite for 
`lpad`/`rpad`)
   - known divergences between Comet and Spark (`initcap` hyphen handling; 
`replace` with empty search; `repeat` with negative count; `decode` charset and 
legacy-flag gaps; `BinaryType` wiring gaps in `bit_length` / `octet_length`; 
`regexp_replace` pos > 1 restriction)
   
   Apply the mechanical support-level consistency fixes surfaced by the audit 
(all in `spark/src/main/scala/org/apache/comet/serde/strings.scala`):
   
   - `CometInitCap`: `Incompatible(None)` -> `Incompatible(Some(reason))` via a 
shared `private val`; drop the dead `convert` override.
   - `CometLength`: extract the `BinaryType` reason into a shared `private val` 
so the `getSupportLevel` branch and `getUnsupportedReasons` stay in sync.
   - `CometRegExpReplace`: pass the existing reasons into 
`Incompatible(Some(...))` / `Unsupported(Some(...))` via shared `private val`s 
so EXPLAIN matches the compatibility-guide text.
   - `CometStringLPad` / `CometStringRPad`: share the two reason strings via a 
small `PadReasons` companion so `getSupportLevel` and `getUnsupportedReasons` 
cannot drift; align wording with backticks.
   - `CometSubstring` / `CometLeft` / `CometRight`: lift the literal-argument 
fallback out of `convert` and into `getSupportLevel` so the dispatcher handles 
it uniformly; replace the convert-time `withInfo` with a declared `Unsupported` 
reason via a shared `private val`.
   
   The audit was driven by 7 parallel agents using the `audit-comet-expression` 
skill, each handling a related group (case conversion + initcap; length family; 
trim; left/right/substring/substring_index; lpad/rpad; simple scalar funcs; 
complex serdes including concat_ws/regexp_replace/repeat/split/decode).
   
   Higher-risk findings (e.g. lifting `spark.comet.caseConversion.enabled` 
gating out of `CometCaseConversionBase.convert`, promoting `CometStringRepeat` 
from `Compatible` to `Incompatible`, adding non-default-collation guards for 
the simple scalar string funcs) are intentionally left as prose notes in the 
support doc rather than acted on here, since each carries a semantics decision 
that warrants its own PR.
   
   ## How are these changes tested?
   
   - `./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite 
expressions/string/" -Dtest=none` (42 tests pass)
   - `./mvnw test -Dsuites="org.apache.comet.CometStringExpressionSuite" 
-Dtest=none` (33 tests pass)
   - Updated the affected fallback-reason assertions in `left.sql`, 
`string_lpad.sql`, `string_rpad.sql`, and `CometStringExpressionSuite.scala` to 
match the new shared reason strings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to