DenisTarasyuk commented on issue #46708: URL: https://github.com/apache/arrow/issues/46708#issuecomment-2955947276
Additional information. I think this issue happens due to undefined behavior in castDECIMAL_utf8. This code does not set out_high, out_low. ``` int32_t status = gdv_fn_dec_from_string(context, in, in_length, &precision_from_str, &scale_from_str, &dec_high_from_str, &dec_low_from_str); if (status != 0) { return; } ``` And LLVM decides to optimize out some parts of code due to this behavior. In linked PR there is test that reproduces issue. Here I link some IR code. [ir_bug_raw.txt](https://github.com/user-attachments/files/20656070/ir_bug_raw.txt) is IR code with bug reproduced. This IR code '_raw' was captured before any optimisations applied by LLVM. [ir_fixed_raw.txt](https://github.com/user-attachments/files/20656068/ir_fixed_raw.txt) is non optimized IR with fix for out_high, out_low set to 0. Then I used LLVM opt command like this: `/llvm/bin/opt -passes='default<O1>' -debug-pass-manager -S ir_fixed_raw.txt -o ir_fixed_O1.txt` This produces optimised IR code. Here are original and fixed IR with O1 optimisation level. They both work fine. [ir_bug_O1.txt](https://github.com/user-attachments/files/20656069/ir_bug_O1.txt) [ir_fixed_O1.txt](https://github.com/user-attachments/files/20656071/ir_fixed_O1.txt) Here are the same but with O3 level. And here original IR has issue: [ir_bug_O3.txt](https://github.com/user-attachments/files/20656073/ir_bug_O3.txt) [ir_fixed_O3.txt](https://github.com/user-attachments/files/20656072/ir_fixed_O3.txt) The issue is here: Original: ``` %61 = call i32 @gdv_fn_dec_from_string(i64 noundef %context_ptr, ptr noundef nonnull @0, i32 noundef 3, ptr noundef nonnull %11, ptr noundef nonnull %12, ptr noundef nonnull %9, ptr noundef nonnull %10) %62 = icmp eq i32 %61, 0 call void @llvm.assume(i1 %62) call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %13) #7 ``` Fixed: ``` %61 = call i32 @gdv_fn_dec_from_string(i64 noundef %context_ptr, ptr noundef nonnull @0, i32 noundef 3, ptr noundef nonnull %11, ptr noundef nonnull %12, ptr noundef nonnull %9, ptr noundef nonnull %10) %62 = icmp eq i32 %61, 0 br i1 %62, label %63, label %castDECIMAL_utf8.exit ``` If I understand correctly in original IR LLVM just assumed that gdv_fn_dec_from_string should always return 0. So code that should have been skipped by return is now executed. This and uninitialised out_high, out_low cause SIGSEGV later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org