kosiew opened a new issue, #22163:
URL: https://github.com/apache/datafusion/issues/22163

   ## Overview
   The `ConversionSpecifier::format` method in 
`datafusion/spark/src/function/string/format_string.rs` contains highly 
repetitive branches for handling `%c` (character) conversion across all signed 
and unsigned integer scalar types (Int8, Int16, Int32, Int64, UInt8, UInt16, 
UInt32, UInt64). Each branch duplicates the same validation and conversion 
logic, making the code difficult to maintain and prone to inconsistency when 
invariants change.
   
   ## Problem Statement
   Currently, the code has near-identical match arms for each numeric type:
   - `Int8`, `Int16`, `Int32`, `Int64`: All call `signed_to_char(*value as i64)`
   - `UInt8`, `UInt16`, `UInt32`, `UInt64`: All call `unsigned_to_char(*value 
as u64)`
   
   **Maintenance burden**: When a new invariant or fix must be applied to `%c` 
handling (as happened in PR #22077), every branch must be updated individually, 
increasing the risk of missed updates and regressions.
   
   **Example of duplication**:
   ```rust
   (ConversionType::CharLower | ConversionType::CharUpper, Some(value)) => {
       self.format_char(string, signed_to_char(*value as i64)?)
   }
   ```
   This pattern appears ~8 times with only the type changed.
   
   ## Proposed refactor
   Create a helper dispatch mechanism that unifies the conversion logic once, 
eliminating duplication:
   
   ## Context
   Encountered this while reviewing #22077


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to