Issue 178632
Summary [X86] Generation of slower vperm2f128 (?)
Labels backend:X86
Assignees
Reporter nikic
    https://llvm.godbolt.org/z/49x1cP1Mf

Codegen here has changed from
```
test:                                   # @test
 vmovsd  xmm0, qword ptr [esp + 28]      # xmm0 = mem[0],zero
 vmovsd  xmm1, qword ptr [esp + 12]      # xmm1 = mem[0],zero
        vmovhps xmm0, xmm0, qword ptr [esp + 20] # xmm0 = xmm0[0,1],mem[0,1]
        vmovhps xmm1, xmm1, qword ptr [esp + 4] # xmm1 = xmm1[0,1],mem[0,1]
 vinsertf128     ymm0, ymm0, xmm1, 1
        ret
```
to
```
test: # @test
        vperm2f128      ymm0, ymm0, ymmword ptr [esp + 4], 35 # ymm0 = mem[2,3,0,1]
        vshufpd ymm0, ymm0, ymm0, 5             # ymm0 = ymm0[1,0,3,2]
        ret
```

llvm-mca seems to think that the new form is slower (for any `-mcpu` I tried).

I'm not sure whether the MCA estimate here is correct, so I wanted to check with someone who is more familiar with this...

(Using `target-cpu=znver3` or similar uses verpmd instead.)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to