kosiew opened a new pull request, #22388:
URL: https://github.com/apache/datafusion/pull/22388
## Which issue does this PR close?
Closes #22163
## Rationale for this change
`ConversionSpecifier::format` in
`datafusion/spark/src/function/string/format_string.rs` contained substantial
duplication across integer scalar variants (`Int8` through `UInt64`). Each
variant repeated nearly identical formatting logic for `%d`, `%x`, `%o`, `%s`,
and `%c`.
This refactor consolidates integer formatting behavior into shared helper
paths to reduce maintenance overhead and lower the risk of drift between signed
and unsigned integer handling, while preserving existing Spark-compatible
behavior and null semantics.
## What changes are included in this PR?
* Introduced `IntegerValue` and `IntegerFormatValue` helper enums to
normalize signed and unsigned integer formatting behavior.
* Added a shared `format_integer` helper on `ConversionSpecifier` to
centralize integer conversion dispatch.
* Consolidated `%c` formatting through `IntegerValue::to_char()` using the
existing `signed_to_char` and `unsigned_to_char` helpers.
* Replaced duplicated per-type integer formatting match arms with shared
dispatch logic for:
* `%d`
* `%x`
* `%o`
* `%s`
* `%c`
* Preserved existing null handling and invalid conversion error behavior.
* Added a table-driven regression test covering formatting equivalence
across signed and unsigned integer widths, including `%c` and null handling.
## Are these changes tested?
Yes.
Added:
* `test_integer_formatting_across_widths`
Existing `%c` validation tests remain in place, including coverage for:
* Invalid Unicode code points
* Surrogate ranges
* Negative values
* Valid `%c` formatting behavior
Suggested focused validation:
* `cargo test -p datafusion-spark format_char`
## Are there any user-facing changes?
No.
This PR is a structural refactor intended to preserve existing formatting
behavior and Spark compatibility semantics.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]