kosiew opened a new issue, #22163:
URL: https://github.com/apache/datafusion/issues/22163
## Overview
The `ConversionSpecifier::format` method in
`datafusion/spark/src/function/string/format_string.rs` contains highly
repetitive branches for handling `%c` (character) conversion across all signed
and unsigned integer scalar types (Int8, Int16, Int32, Int64, UInt8, UInt16,
UInt32, UInt64). Each branch duplicates the same validation and conversion
logic, making the code difficult to maintain and prone to inconsistency when
invariants change.
## Problem Statement
Currently, the code has near-identical match arms for each numeric type:
- `Int8`, `Int16`, `Int32`, `Int64`: All call `signed_to_char(*value as i64)`
- `UInt8`, `UInt16`, `UInt32`, `UInt64`: All call `unsigned_to_char(*value
as u64)`
**Maintenance burden**: When a new invariant or fix must be applied to `%c`
handling (as happened in PR #22077), every branch must be updated individually,
increasing the risk of missed updates and regressions.
**Example of duplication**:
```rust
(ConversionType::CharLower | ConversionType::CharUpper, Some(value)) => {
self.format_char(string, signed_to_char(*value as i64)?)
}
```
This pattern appears ~8 times with only the type changed.
## Proposed refactor
Create a helper dispatch mechanism that unifies the conversion logic once,
eliminating duplication:
## Context
Encountered this while reviewing #22077
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]