https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236
Jan Hubicka changed:
What|Removed |Added
Ever confirmed|0 |1
Last reconfirmed||2024-01-05
CC||hubicka at gcc dot gnu.org
Status|UNCONFIRMED |NEW
--- Comment #2 from Jan Hubicka ---
On zen3 I get 0.75MP/s for GCC and 0.80MP/s for clang, so only 6.6%, but seems
reproducible.
Profile looks comparable:
gcc
30.96% cwebplibwebp.so.7.1.5 [.]
GetCombinedEntropyUnre
26.19% cwebplibwebp.so.7.1.5 [.] VP8LHashChainFill
3.34% cwebplibwebp.so.7.1.5 [.]
CalculateBestCacheSize
3.30% cwebplibwebp.so.7.1.5 [.]
CombinedShannonEntropy
3.21% cwebplibwebp.so.7.1.5 [.]
CollectColorBlueTransf
clang:
34.06% cwebplibwebp.so.7.1.5[.] GetCombinedEntropy
28.95% cwebplibwebp.so.7.1.5[.] VP8LHashChainFill
5.37% cwebplibwebp.so.7.1.5[.]
VP8LGetBackwardReferences
4.39% cwebplibwebp.so.7.1.5[.]
CombinedShannonEntropy_SS
4.28% cwebplibwebp.so.7.1.5[.]
CollectColorBlueTransform
In the first loop clang seems to ifconvert while GCC doesn't:
0.59 │ lea kSLog2Table,%rdi
3.69 │ vmovss (%rdi,%rax,4),%xmm0
0.98 │ 6f: vcvtsi2ss%edx,%xmm2,%xmm1
0.63 │ vfnmadd213ss 0x0(%r13),%xmm0,%xmm1
38.16 │ vmovss %xmm1,0x0(%r13)
5.48 │ cmp %r12d,0xc(%r13)
0.06 │ ↓ jae 89
│ mov %r12d,0xc(%r13)
0.99 │ 89: mov 0x4(%r13),%edi
0.96 │ 8d: xor %eax,%eax
0.40 │ test %r12d,%r12d
0.60 │ setne%al
│ vcvtsd2ss%xmm0,%xmm0,%xmm1
0.02 │362: mov %r15d,%eax
0.57 │ imul %r12d,%eax
0.00 │ cmp %r12d,%r9d
0.03 │ cmovbe %r12d,%r9d
0.02 │ vmovd%eax,%xmm0
0.08 │ vpinsrd $0x1,%r15d,%xmm0,%xmm0
1.50 │ vpaddd %xmm0,%xmm4,%xmm4
1.08 │ vcvtsi2ss%r15d,%xmm5,%xmm0
0.87 │ vfnmadd231ss %xmm0,%xmm1,%xmm3
5.40 │ vmovaps %xmm3,%xmm0
0.02 │38c: xor %eax,%eax
0.16 │ cmp $0x4,%r15d