https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510
Bug ID: 102510 Summary: Function call has unnecessary aliasing check Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: dwwork at gmail dot com Target Milestone: --- The following 2 functions semantically do the same thing, they add two fixed size arrays and store them into a third. When compiled with "-O3 -mavx" for x86_64, I expect to see a single avx instruction. The first version does this correctly, while the second has an aliasing check with a vectorized branch and a scalar branch (I think). The second version is incorrect, and should produce similar vectorized assembly to the first, as fortran does not allow function arguments to alias. I could be wrong of course, but that is my understanding. subroutine add2vecs1(a,b,c) use iso_fortran_env, only: r32 => real32 real(r32), dimension(8), intent(in) :: a,b real(r32), dimension(8), intent(out) :: c c = a + b end subroutine Output Assembly (from godbolt.org, https://godbolt.org/z/aedEe7rGM): add2vecs1_: vmovups ymm0, YMMWORD PTR [rdi] vaddps ymm0, ymm0, YMMWORD PTR [rsi] vmovups YMMWORD PTR [rdx], ymm0 vzeroupper ret function add2vecs2(a,b) use iso_fortran_env, only: r32 => real32 real(r32), dimension(8), intent(in) :: a,b real(r32), dimension(8) :: add2vecs2 add2vecs2 = a + b end function Output Assembly: add2vecs2_: mov rax, QWORD PTR [rdi+40] mov rcx, QWORD PTR [rdi] test rax, rax je .L5 cmp rax, 1 jne .L11 .L5: vmovups ymm0, YMMWORD PTR [rdx] vaddps ymm0, ymm0, YMMWORD PTR [rsi] vmovups YMMWORD PTR [rcx], ymm0 vzeroupper ret .L11: vmovups xmm1, XMMWORD PTR [rdx] vaddps xmm0, xmm1, XMMWORD PTR [rsi] lea rdi, [rcx+rax*8] mov r8, rax sal r8, 4 vmovss DWORD PTR [rcx], xmm0 vextractps DWORD PTR [rcx+rax*4], xmm0, 1 vextractps DWORD PTR [rcx+rax*8], xmm0, 2 vextractps DWORD PTR [rdi+rax*4], xmm0, 3 vmovups xmm0, XMMWORD PTR [rdx+16] vaddps xmm0, xmm0, XMMWORD PTR [rsi+16] lea rdi, [rcx+r8] lea rdx, [rdi+rax*8] vmovss DWORD PTR [rcx+r8], xmm0 vextractps DWORD PTR [rdi+rax*4], xmm0, 1 vextractps DWORD PTR [rdi+rax*8], xmm0, 2 vextractps DWORD PTR [rdx+rax*4], xmm0, 3 ret