[Bug target/113570] RISC-V: SPEC2017 549 fotonik3d miscompilation in autovec VLS 256 build

2024-01-24 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113570

--- Comment #5 from JuzheZhong  ---
It seems that we don't have any bugs in current SPEC 2017 testing.

So I strongly suggest "full coverage" testing on SPEC 2017 which I mentioned
in PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

-march=rv64gcv --param=riscv-autovec-lmul=m2
-march=rv64gcv --param=riscv-autovec-lmul=m4
-march=rv64gcv --param=riscv-autovec-lmul=m8
-march=rv64gcv --param=riscv-autovec-lmul=dynamic

-march=rv64gcv_zvl256b --param=riscv-autovec-lmul=m2
-march=rv64gcv_zvl256b --param=riscv-autovec-lmul=m4
-march=rv64gcv_zvl256b --param=riscv-autovec-lmul=m8
-march=rv64gcv_zvl256b --param=riscv-autovec-lmul=dynamic

-march=rv64gcv_zvl512b --param=riscv-autovec-lmul=m2
-march=rv64gcv_zvl512b --param=riscv-autovec-lmul=m4
-march=rv64gcv_zvl512b --param=riscv-autovec-lmul=m8
-march=rv64gcv_zvl512b --param=riscv-autovec-lmul=dynamic

-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax
-march=rv64gcv --param=riscv-autovec-lmul=m2
--param=riscv-autovec-preference=fixed-vlmax
-march=rv64gcv --param=riscv-autovec-lmul=m4
--param=riscv-autovec-preference=fixed-vlmax
-march=rv64gcv --param=riscv-autovec-lmul=m8
--param=riscv-autovec-preference=fixed-vlmax
-march=rv64gcv --param=riscv-autovec-lmul=dynamic
--param=riscv-autovec-preference=fixed-vlmax

-march=rv64gcv_zvl256b --param=riscv-autovec-preference=fixed-vlmax
-march=rv64gcv_zvl256b --param=riscv-autovec-lmul=m2
--param=riscv-autovec-preference=fixed-vlmax
-march=rv64gcv_zvl256b --param=riscv-autovec-lmul=m4
--param=riscv-autovec-preference=fixed-vlmax
-march=rv64gcv_zvl256b --param=riscv-autovec-lmul=m8
--param=riscv-autovec-preference=fixed-vlmax
-march=rv64gcv_zvl256b --param=riscv-autovec-lmul=dynamic
--param=riscv-autovec-preference=fixed-vlmax

-march=rv64gcv_zvl512b --param=riscv-autovec-preference=fixed-vlmax
-march=rv64gcv_zvl512b --param=riscv-autovec-lmul=m2
--param=riscv-autovec-preference=fixed-vlmax
-march=rv64gcv_zvl512b --param=riscv-autovec-lmul=m4
--param=riscv-autovec-preference=fixed-vlmax
-march=rv64gcv_zvl512b --param=riscv-autovec-lmul=m8
--param=riscv-autovec-preference=fixed-vlmax
-march=rv64gcv_zvl512b --param=riscv-autovec-lmul=dynamic
--param=riscv-autovec-preference=fixed-vlmax

Could you trigger these testing ?

[Bug target/113570] RISC-V: SPEC2017 549 fotonik3d miscompilation in autovec VLS 256 build

2024-01-24 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113570

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Jeffrey A. Law  ---
Just looked a little closer.  Looks like the same signature.  Essentially you
can't use -Ofast (or fast-math in general) with this benchmark, especially with
vectorization.

*** This bug has been marked as a duplicate of bug 84201 ***

[Bug target/113570] RISC-V: SPEC2017 549 fotonik3d miscompilation in autovec VLS 256 build

2024-01-24 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113570

--- Comment #3 from Jeffrey A. Law  ---
See pr84201 for more details as well as
https://www.spec.org/cpu2017/Docs/benchmarks/549.fotonik3d_r.html

[Bug target/113570] RISC-V: SPEC2017 549 fotonik3d miscompilation in autovec VLS 256 build

2024-01-23 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113570

--- Comment #2 from Robin Dapp  ---
I'm pretty certain this is "works as intended" and -Ofast causes the precision
to be different than with -O3 (and dependant on the target).  See also:


It has been reported that with gfortran -Ofast -march=native verification
errors may be seen, for example:


*** Miscompare of pscyee.out; for details see
   
/data2/johnh/out.v1.1.5/benchspec/CPU/549.fotonik3d_r/run/run_base_refrate_Ofastnative./pscyee.out.mis
0646:   -1.91273086037953E-17, -1.46491401919706E-15,
-1.91273086057460E-17, -1.46491401919687E-15,
^
0668:   -1.91251317582607E-17, -1.42348205527085E-15,
-1.91251317602571E-17, -1.42348205527068E-15,
^

The errors may occur with other compilers as well, depending on your particular
compiler version, hardware platform, and optimization options.

The problem arises when a compiler chooses to vectorize a particular loop from
power.F90 line number 369

369   do ifreq = 1, tmppower%nofreq
370 frequency(ifreq,ipower) = freq
371 freq = freq + freqstep
372   end do



from https://www.spec.org/cpu2017/Docs/benchmarks/549.fotonik3d_r.html
which further states:


Workaround: You will need to specify optimization options that do not cause
this loop to be vectorized. For example, on a particular platform studied in
mid-2020 using GCC 10.2, these results were seen:

OK -Ofast -march=native -fno-unsafe-math-optimization 

If you apply one of the above workarounds in base, be sure to obey the
same-for-all rule which requires that all benchmarks in a suite of a given
language must use the same flags. For example, the sections below turn off
unsafe math optimizations for all Fortran modules in the floating point rate
and floating point speed benchmark suites:

default=base: 
  OPTIMIZE   = -Ofast -flto -march=native 
fprate,fpspeed=base:
  FOPTIMIZE  = -fno-unsafe-math-optimizations

[Bug target/113570] RISC-V: SPEC2017 549 fotonik3d miscompilation in autovec VLS 256 build

2024-01-23 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113570

--- Comment #1 from Vineet Gupta  ---
This one is a headache as we don't know where the problem is. And that it takes
~7hr for a QEMU run to finish.

Good this is there is a comparison point as VLA build works fine.

(1). bloat-o-meter (from Linux kernel) to diff the VLS (nok) and VLA (ok)
builds.

  Function old new   delta
  init_   67226752 +30
  __huygens_mod_MOD_huygense 17078   17990+912
  __huygens_mod_MOD_huygensh 14412   15614   +1202
  __huygens_mod_MOD_uin.isra  29222944 +22
  __material_mod_MOD_mat_updatee  42284272 +44

  __mur_mod_MOD_mur_init  90549162+108
  __mur_mod_MOD_mur_storee25462446-100
  __mur_mod_MOD_mur_updatee  10124   10354+230

  __pec_mod_MOD_pec_init  85228046-476

  __plane_source_mod_MOD_plane_source_init69427072+130

  __power_mod_MOD___copy_power_mod_Powertyp  14  26 +12
  __power_mod_MOD_power_dft   12801156-124
  __power_mod_MOD_power_init  98309994+164
  __power_mod_MOD_power_print 23041556-748

  __upml_mod_MOD_upml_allocate.isra  19384   19614+230
  __upml_mod_MOD_upml_init   25564   26178+614
  __upml_mod_MOD_upml_set_eps_arrays.isra 54066356+950
  __upml_mod_MOD_upml_updatee32516   27130   -5386
  __upml_mod_MOD_upml_updatee_simple 36112   29612   -6500
  __upml_mod_MOD_upml_updateh15962   15992 +30

  writeout_  10856   11002+146

(2). Assuming the issue is one of those above (which may not be the true),
manually  rebuild, changing build flags to VLA, one module at a time, relink,
rerun qemu and compare output.

   - This resulted in power.fppized.f90 as the culprit

(3) Manually split up the power module into multiple files - one function at a
time and do the same exercise to identify the function.