https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236
Jan Hubicka <hubicka at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Last reconfirmed| |2024-01-05 CC| |hubicka at gcc dot gnu.org Status|UNCONFIRMED |NEW --- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> --- On zen3 I get 0.75MP/s for GCC and 0.80MP/s for clang, so only 6.6%, but seems reproducible. Profile looks comparable: gcc 30.96% cwebp libwebp.so.7.1.5 [.] GetCombinedEntropyUnre 26.19% cwebp libwebp.so.7.1.5 [.] VP8LHashChainFill 3.34% cwebp libwebp.so.7.1.5 [.] CalculateBestCacheSize 3.30% cwebp libwebp.so.7.1.5 [.] CombinedShannonEntropy 3.21% cwebp libwebp.so.7.1.5 [.] CollectColorBlueTransf clang: 34.06% cwebp libwebp.so.7.1.5 [.] GetCombinedEntropy 28.95% cwebp libwebp.so.7.1.5 [.] VP8LHashChainFill 5.37% cwebp libwebp.so.7.1.5 [.] VP8LGetBackwardReferences 4.39% cwebp libwebp.so.7.1.5 [.] CombinedShannonEntropy_SS 4.28% cwebp libwebp.so.7.1.5 [.] CollectColorBlueTransform In the first loop clang seems to ifconvert while GCC doesn't: 0.59 │ lea kSLog2Table,%rdi 3.69 │ vmovss (%rdi,%rax,4),%xmm0 0.98 │ 6f: vcvtsi2ss %edx,%xmm2,%xmm1 0.63 │ vfnmadd213ss 0x0(%r13),%xmm0,%xmm1 38.16 │ vmovss %xmm1,0x0(%r13) 5.48 │ cmp %r12d,0xc(%r13) 0.06 │ ↓ jae 89 │ mov %r12d,0xc(%r13) 0.99 │ 89: mov 0x4(%r13),%edi 0.96 │ 8d: xor %eax,%eax 0.40 │ test %r12d,%r12d 0.60 │ setne %al │ vcvtsd2ss %xmm0,%xmm0,%xmm1 0.02 │362: mov %r15d,%eax 0.57 │ imul %r12d,%eax 0.00 │ cmp %r12d,%r9d 0.03 │ cmovbe %r12d,%r9d 0.02 │ vmovd %eax,%xmm0 0.08 │ vpinsrd $0x1,%r15d,%xmm0,%xmm0 1.50 │ vpaddd %xmm0,%xmm4,%xmm4 1.08 │ vcvtsi2ss %r15d,%xmm5,%xmm0 0.87 │ vfnmadd231ss %xmm0,%xmm1,%xmm3 5.40 │ vmovaps %xmm3,%xmm0 0.02 │38c: xor %eax,%eax 0.16 │ cmp $0x4,%r15d