vanshaj2023 commented on issue #49310:
URL: https://github.com/apache/arrow/issues/49310#issuecomment-3916432040

   Hi @kccqzy  
   I'd like to work on this issue.
   
   **Implementation Approach:**
   
   The segfault occurs in `pyarrow.compute.if_else` when the result exceeds 
string type capacity (2GB limit) but arguments are string type. I plan to:
   
   1. Add validation in the `if_else` kernel to detect when output size would 
exceed string type limits
   2. Automatically promote string → large_string when necessary, similar to 
how other Arrow kernels handle type promotion
   3. Add proper error handling to raise `ArrowInvalid` exception instead of 
segfaulting if auto-promotion isn't feasible
   
   I'll look into the C++ implementation of the if_else kernel (likely in 
`cpp/src/arrow/compute/kernels/scalar_if_else.cc`) to add size checks and type 
promotion logic.
   
   **Questions:**
   - Does this approach sound correct?
   - Should we auto-promote to large_string or raise an exception and let users 
explicitly cast?
   - Any specific areas of the codebase I should focus on?
   
   Could you please assign this issue to me if the approach looks good?
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to