vanshaj2023 commented on issue #49310: URL: https://github.com/apache/arrow/issues/49310#issuecomment-3916432040
Hi @kccqzy I'd like to work on this issue. **Implementation Approach:** The segfault occurs in `pyarrow.compute.if_else` when the result exceeds string type capacity (2GB limit) but arguments are string type. I plan to: 1. Add validation in the `if_else` kernel to detect when output size would exceed string type limits 2. Automatically promote string → large_string when necessary, similar to how other Arrow kernels handle type promotion 3. Add proper error handling to raise `ArrowInvalid` exception instead of segfaulting if auto-promotion isn't feasible I'll look into the C++ implementation of the if_else kernel (likely in `cpp/src/arrow/compute/kernels/scalar_if_else.cc`) to add size checks and type promotion logic. **Questions:** - Does this approach sound correct? - Should we auto-promote to large_string or raise an exception and let users explicitly cast? - Any specific areas of the codebase I should focus on? Could you please assign this issue to me if the approach looks good? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
