https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-01-05
                 CC|                            |hubicka at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
On zen3 I get 0.75MP/s for GCC and 0.80MP/s for clang, so only 6.6%, but seems
reproducible.

Profile looks comparable:

gcc
  30.96%  cwebp            libwebp.so.7.1.5               [.]
GetCombinedEntropyUnre
  26.19%  cwebp            libwebp.so.7.1.5               [.] VP8LHashChainFill 
   3.34%  cwebp            libwebp.so.7.1.5               [.]
CalculateBestCacheSize
   3.30%  cwebp            libwebp.so.7.1.5               [.]
CombinedShannonEntropy
   3.21%  cwebp            libwebp.so.7.1.5               [.]
CollectColorBlueTransf

clang:

  34.06%  cwebp            libwebp.so.7.1.5            [.] GetCombinedEntropy   
  28.95%  cwebp            libwebp.so.7.1.5            [.] VP8LHashChainFill    
   5.37%  cwebp            libwebp.so.7.1.5            [.]
VP8LGetBackwardReferences
   4.39%  cwebp            libwebp.so.7.1.5            [.]
CombinedShannonEntropy_SS
   4.28%  cwebp            libwebp.so.7.1.5            [.]
CollectColorBlueTransform


In the first loop clang seems to ifconvert while GCC doesn't:
  0.59 │       lea          kSLog2Table,%rdi
  3.69 │       vmovss       (%rdi,%rax,4),%xmm0
  0.98 │ 6f:   vcvtsi2ss    %edx,%xmm2,%xmm1
  0.63 │       vfnmadd213ss 0x0(%r13),%xmm0,%xmm1
 38.16 │       vmovss       %xmm1,0x0(%r13)
  5.48 │       cmp          %r12d,0xc(%r13)
  0.06 │     ↓ jae          89             
       │       mov          %r12d,0xc(%r13)
  0.99 │ 89:   mov          0x4(%r13),%edi 
  0.96 │ 8d:   xor          %eax,%eax      
  0.40 │       test         %r12d,%r12d    
  0.60 │       setne        %al                                                 



       │       vcvtsd2ss    %xmm0,%xmm0,%xmm1                                   
  0.02 │362:   mov          %r15d,%eax                                          
  0.57 │       imul         %r12d,%eax                                          
  0.00 │       cmp          %r12d,%r9d                                          
  0.03 │       cmovbe       %r12d,%r9d                                          
  0.02 │       vmovd        %eax,%xmm0                                          
  0.08 │       vpinsrd      $0x1,%r15d,%xmm0,%xmm0                              
  1.50 │       vpaddd       %xmm0,%xmm4,%xmm4                                   
  1.08 │       vcvtsi2ss    %r15d,%xmm5,%xmm0                                   
  0.87 │       vfnmadd231ss %xmm0,%xmm1,%xmm3                                   
  5.40 │       vmovaps      %xmm3,%xmm0                                         
  0.02 │38c:   xor          %eax,%eax                                           
  0.16 │       cmp          $0x4,%r15d

Reply via email to