DenisTarasyuk commented on issue #46708:
URL: https://github.com/apache/arrow/issues/46708#issuecomment-2955947276

   Additional information.
   I think this issue happens due to undefined behavior in castDECIMAL_utf8. 
   This code does not set out_high, out_low.
   ```
     int32_t status =
         gdv_fn_dec_from_string(context, in, in_length, &precision_from_str, 
&scale_from_str,
                                &dec_high_from_str, &dec_low_from_str);
     if (status != 0) {
       return;
     }
   ```
   And LLVM decides to optimize out some parts of code due to this behavior.
   In linked PR there is test that reproduces issue.
   Here I link some IR code. 
[ir_bug_raw.txt](https://github.com/user-attachments/files/20656070/ir_bug_raw.txt)
 is IR code with bug reproduced. This IR code '_raw' was captured before any 
optimisations applied by LLVM. 
[ir_fixed_raw.txt](https://github.com/user-attachments/files/20656068/ir_fixed_raw.txt)
 is non optimized IR with fix for out_high, out_low set to 0. 
   Then I used LLVM opt command like this:
   `/llvm/bin/opt -passes='default<O1>' -debug-pass-manager -S ir_fixed_raw.txt 
-o ir_fixed_O1.txt`
   This produces optimised IR code.
   Here are original and fixed IR with O1 optimisation level. They both work 
fine.
   
[ir_bug_O1.txt](https://github.com/user-attachments/files/20656069/ir_bug_O1.txt)
   
[ir_fixed_O1.txt](https://github.com/user-attachments/files/20656071/ir_fixed_O1.txt)
   Here are the same but with O3 level. And here original IR has issue:
   
[ir_bug_O3.txt](https://github.com/user-attachments/files/20656073/ir_bug_O3.txt)
   
[ir_fixed_O3.txt](https://github.com/user-attachments/files/20656072/ir_fixed_O3.txt)
   
   The issue is here:
   
   Original:
   ```
   %61 = call i32 @gdv_fn_dec_from_string(i64 noundef %context_ptr, ptr noundef 
nonnull @0, i32 noundef 3, ptr noundef nonnull %11, ptr noundef nonnull %12, 
ptr noundef nonnull %9, ptr noundef nonnull %10)
     %62 = icmp eq i32 %61, 0
     call void @llvm.assume(i1 %62)
     call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %13) #7
   ```
   Fixed:
   ```
   %61 = call i32 @gdv_fn_dec_from_string(i64 noundef %context_ptr, ptr noundef 
nonnull @0, i32 noundef 3, ptr noundef nonnull %11, ptr noundef nonnull %12, 
ptr noundef nonnull %9, ptr noundef nonnull %10)
     %62 = icmp eq i32 %61, 0
     br i1 %62, label %63, label %castDECIMAL_utf8.exit
   ```
   If I understand correctly in original IR LLVM just assumed that 
gdv_fn_dec_from_string should always return 0. So code that should have been 
skipped by return is now executed. This and uninitialised  out_high, out_low 
cause SIGSEGV later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to