[Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)

hubicka at gcc dot gnu.org via Gcc-bugs Tue, 18 Jul 2023 07:50:02 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649


--- Comment #14 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Chasing profile update bugs out of the hottest two functions did not solve the
regression. Moreover the weekly testers confirm it was not noise on zens
either.

Before the change we get:

  34.58%  sphinx_livepret  [.] mgau_eval                              ◆
  26.61%  sphinx_livepret  [.] vector_gautbl_eval_logs3               ▒
   8.94%  sphinx_livepret  [.] subvq_mgau_shortlist                   ▒
   7.36%  sphinx_livepret  [.] logs3_add                              ▒
   5.66%  sphinx_livepret  [.] approx_cont_mgau_frame_eval            ▒
   4.68%  sphinx_livepret  [.] mdef_sseq2sen_active                   ▒
   3.38%  sphinx_livepret  [.] dict2pid_comsenscr                     ▒
   1.66%  sphinx_livepret  [.] hmm_vit_eval_3st                       ▒
   0.90%  sphinx_livepret  [.] lextree_hmm_eval                       ▒
   0.73%  sphinx_livepret  [.] lextree_hmm_propagate                  ▒
   0.71%  sphinx_livepret  [.] lextree_enter                          ▒
   0.68%  sphinx_livepret  [.] fe_fft                                 ▒
   0.49%  sphinx_livepret  [.] dict2pid_comsseq2sen_active            ▒
   0.35%  sphinx_livepret  [.] lextree_ssid_active                    ▒
   0.20%  sphinx_livepret  [.] vithist_rescore                        ▒

So difference seems to be mgau_eval.
Both version of mgau_eval has almost same code layout. Main difference is
registr allocation.  In old version we do more spill around call:

 0.01 │       and          $0xffffffffffffffe0,%rsp                  ▒
  0.14 │       mov          %rcx,%rbx                                 ▒
  0.00 │       sub          $0xa0,%rsp                                ▒
  0.04 │       mov          0x10(%rdi),%rax                           ▒
  0.13 │       mov          0x8(%rdi),%r15d                           ▒
  0.01 │       vmovaps      %xmm3,0x80(%rsp)                          ▒
  0.22 │       vmovaps      %xmm2,0x90(%rsp)                          ▒
  0.03 │       mov          %rdi,0x70(%rsp)                           ▒
  0.05 │       lea          (%rax,%rdx,8),%r14                        ▒
  0.01 │       call         log_to_logs3_factor                       ▒
  1.00 │       test         %r13,%r13                                 ▒
  0.00 │       vxorps       %xmm4,%xmm4,%xmm4                         ▒
  0.02 │       vmovsd       %xmm0,0x78(%rsp)                          ▒
  0.00 │       je           433                                       ▒
  0.01 │       movslq       0x0(%r13),%rax                            ▒
  0.02 │       mov          $0xc8000000,%edi                          ▒
  0.01 │       vmovaps      0x90(%rsp),%xmm2                          ▒
  0.23 │       vmovaps      0x80(%rsp),%xmm3                          ▒
  0.09 │       test         %eax,%eax                                 ▒
  0.00 │       js           3f9                                       ▒

new verison is missing the spill of xmm2/3

  0.02 │       and          $0xffffffffffffffe0,%rsp                  ▒
  0.03 │       mov          %rcx,%rbx                                 ▒
  0.01 │       add          $0xffffffffffffff80,%rsp                  ▒
  0.03 │       mov          0x10(%rdi),%rax                           ▒
  0.16 │       mov          0x8(%rdi),%r15d                           ▒
  0.06 │       mov          %rdi,0x50(%rsp)                           ▒
  0.12 │       lea          (%rax,%rdx,8),%r14                        ▒
  0.01 │       call         log_to_logs3_factor                       ▒
  0.75 │       test         %r12,%r12                                 ▒
  0.00 │       vxorps       %xmm3,%xmm3,%xmm3                         ▒
  0.01 │       vmovsd       %xmm0,0x58(%rsp)                          ▒
  0.01 │       je           3f2                                       ▒
  0.01 │       movslq       (%r12),%rcx                               ▒
  0.00 │       mov          $0xc8000000,%edi                          ▒
       │       test         %ecx,%ecx                                 ▒
  0.14 │       js           3b8                                       ▒

Which looks better. log_to_logs3_factor just returns constant:

Percent│     vmovsd invlogB,%xmm0                                      
       │     ret                                                       

I wonder why we no longer need to spill. log_to_logs3_factor is from other
translation unit and this is non-LTO build. Maybe there are undefined
variables.

New version does:
  0.29 │       vmovhps      %xmm4,0x70(%rsp)                          ▒
  0.11 │       vmovaps      0x70(%rsp),%xmm7                          ▒
and this looks odd.

[Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)

Reply via email to