kosiew opened a new pull request, #22990:
URL: https://github.com/apache/datafusion/pull/22990

   ## Which issue does this PR close?
   
   * Part of #22688
   
   ## Rationale for this change
   
   `GenericStringArrayBuilder` only exposed infallible append APIs that panic 
when string offsets exceed the underlying offset type limits. String functions 
such as `replace`, `replace_view`, and the generic `initcap` path relied on 
these APIs, meaning extreme output sizes could panic instead of returning a 
recoverable `DataFusionError`.
   
   This change introduces fallible builder APIs and migrates selected string 
UDFs to use them so offset overflow is reported as an error rather than causing 
a panic. 
   
   ## What changes are included in this PR?
   
   * Add overflow-checked helper functions to `GenericStringArrayBuilder`:
   
     * `try_offset`
     * `try_push_offset_for_len`
     * `try_append_bytes`
   * Add fallible append APIs:
   
     * `try_append_value`
     * `try_append_placeholder`
     * `try_append_byte_map`
     * `try_append_with`
   * Introduce a shared overflow error path that returns a `DataFusionError` 
instead of panicking.
   * Keep existing infallible append APIs for compatibility while documenting 
that new overflow-sensitive call sites should prefer the `try_*` variants.
   * Refactor `replace` and `replace_view` to share a generic `replace_arrays` 
implementation.
   * Change `apply_replace` to return `Result<()>` and propagate errors from 
builder operations.
   * Update `replace`/`replace_view` to use the new fallible builder APIs and 
thread errors with `?`.
   * Update the generic `Utf8`/`LargeUtf8` path in `initcap` to use 
`try_append_placeholder` and `try_append_value`.
   * Add rollback handling in `try_append_with` so builder state is restored if 
offset validation fails. 
   
   ## Are these changes tested?
   
   Yes.
   
   Added tests in `datafusion/functions/src/strings.rs`:
   
   * `generic_string_builder_try_append_success_path`
   * `generic_string_builder_mixed_append_success_path`
   * `generic_string_builder_try_offset_overflow`
   * `generic_string_builder_try_append_bytes_overflow`
   
   Existing `replace` and `initcap` tests remain in place and the migrated code 
paths continue to be exercised by those test suites. 
   
   ## Are there any user-facing changes?
   
   Yes.
   
   For extreme string outputs that exceed the offset limits of the underlying 
string array type, affected functions now return a `DataFusionError` instead of 
panicking. Normal behavior and results are otherwise unchanged. 
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to