https://bugs.llvm.org/show_bug.cgi?id=50598

            Bug ID: 50598
           Summary: suboptimal vectorization on function call with >10
                    parameters with -march=znver2
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected], [email protected]

example code can be found on [github](https://github.com/Apache-HB/bench).

when benchmarking this code on a 2700X (znver2) the highest optimization level
performs 10% slower than GCC generated code on average.

llvm pushes arguments onto the stack with
```
vbroadcastsd r256, m64
vmovups m256, r256
```

which is slower on znver2 than the less vectorized
```
push r64
push r64
push r64
push r64
```

currently llvm generates
```
mov m64, imm64
mov m64, imm64
mov m64, imm64
mov m64, imm64
```
without vectorization enabled which is 20% slower than GCC and 10% slower than
the vectorized equivalent on znver2.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to