Hi Ole,
On 28/09/2023 10:45, Ole Holm Nielsen wrote:
Dear Kenneth,
On 9/28/23 09:42, Kenneth Hoste wrote:
I suspect the problem is more with OpenBLAS than GCC.
OpenBLAS 0.3.20 probably doesn't detect AMD Genoa (Zen4) correctly
yet, and doesn't try to use AVX-512 instructions there.
OpenBLAS 0.3.21 detects Genoa, enbales AVX-512, but there's a bug in a
kernel being used.
I would try and see whether you observe any problems with more recent
OpenBLAS versions, like OpenBLAS-0.3.23-GCC-12.3.0.eb .
That version build correctly:
$ eb OpenBLAS-0.3.23-GCC-12.3.0.eb -r
(lines deleted)
== processing EasyBuild easyconfig
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.23-GCC-12.3.0.eb
== building and installing OpenBLAS/0.3.23-GCC-12.3.0...
== fetching files...
== ... (took 6 secs)
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== ... (took 57 secs)
== testing...
== ... (took 2 mins 34 secs)
== installing...
== ... (took 2 secs)
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
== cleaning up...
== creating module...
== permissions...
== packaging...
== COMPLETED: Installation ended successfully (took 3 mins 42 secs)
== Results of the build can be found in the log file(s)
/home/modules/software/OpenBLAS/0.3.23-GCC-12.3.0/easybuild/easybuild-OpenBLAS-0.3.23-20230928.103500.log
== Build succeeded for 22 out of 22
If not, we may be able to trace down the fix and patch OpenBLAS 0.3.21
to fix the problem you're seeing...
So is there any hope that foss-2022b.eb with OpenBLAS/0.3.21-GCC-12.2.0
can be made to work correctly on AMD Genoa nodes?
Not seeing the problem with OpenBLAS 0.3.23 is encouraging, that
probably means a fix is hiding in either OpenBLAS 0.3.22 or 0.3.23 that
we may be able to backport to 0.3.21.
I don't see anything obvious in the release notes though (see
https://github.com/OpenMathLib/OpenBLAS/releases) at first glance.
Can you try and see if there's a problem with OpenBLAS 0.3.22, by using:
eb --try-software-version 0.3.22 OpenBLAS-0.3.23-GCC-12.3.0.eb
That would help narrow things down (a bit).
regards,
Kenneth
Thanks,
Ole
On 28/09/2023 09:26, Ole Holm Nielsen wrote:
It's interesting that while attempting to build the foss-2022a
toolchain in stead of foss-2022b, the build of OpenBLAS with GCC
11.3.0 succeeds without errors:
== processing EasyBuild easyconfig
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.20-GCC-11.3.0.eb
== building and installing OpenBLAS/0.3.20-GCC-11.3.0...
== fetching files...
== ... (took 4 secs)
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== ... (took 56 secs)
== testing...
== ... (took 2 mins 24 secs)
== installing...
== ... (took 1 secs)
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
== cleaning up...
== creating module...
== permissions...
== packaging...
== COMPLETED: Installation ended successfully (took 3 mins 28 secs)
The only difference here appears to be GCC version 12.2.0 versus 11.3.0!
Any ideas about what's causing this error in the tests?
Perhaps GCC version 12.2.0 tries to use the new AVX-512 instructions
in AMD Genoa and has a bug?
Thanks,
Ole
On 9/26/23 08:04, Ole Holm Nielsen wrote:
I'm starting EasyBuild up on our new AMD "Genoa" platform with 1 AMD
EPYC 9124 16-Core Processor with 2 threads/core, 384 GB RAM, and
AlmaLinux 8.8 OS.
Unfortunately, building the foss-2022b toolchain exits during the
testing phase of OpenBLAS-0.3.21-GCC-12.2.0.eb as shown below. Does
anyone have ideas about what might be wrong?
$ eb foss-2022b.eb -r
(lines deleted)
== processing EasyBuild easyconfig
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.21-GCC-12.2.0.eb
== building and installing OpenBLAS/0.3.21-GCC-12.2.0...
== fetching files...
== ... (took 7 secs)
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== ... (took 53 secs)
== testing...
== ... (took 12 secs)
== FAILED: Installation ended unsuccessfully (build directory:
/dev/shm/OpenBLAS/0.3.21/GCC-12.2.0): build failed (first 300
chars): cmd " make tests BINARY='64' CC='gcc' FC='gfortran'
MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1' " exited with exit
code 2 and output:
/home/modules/software/binutils/2.39-GCCcore-12.2.0/bin/ld: warning:
/tmp/eb-74m3kzgo/ccy1Gkzg.o: missing .note.GNU-stack section implies
executable stack
/ (took 1 min 14 secs)
== Results of the build can be found in the log file(s)
/tmp/eb-74m3kzgo/easybuild-OpenBLAS-0.3.21-20230925.161149.UfDUO.log
ERROR: Build of
/home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.21-GCC-12.2.0.eb failed (err: 'build failed (first 300 chars): cmd " make tests BINARY=\'64\' CC=\'gcc\' FC=\'gfortran\' MAKE_NB_JOBS=\'-1\' USE_OPENMP=\'1\' USE_THREAD=\'1\' " exited with exit code 2 and output:\n/home/modules/software/binutils/2.39-GCCcore-12.2.0/bin/ld: warning: /tmp/eb-74m3kzgo/ccy1Gkzg.o: missing .note.GNU-stack section implies executable stack\n/')
The log file shows some an error in test_kernel_regress.c:50:
(lines deleted)
./openblas_utest
TEST 1/37 max:smax_zero [OK]
TEST 2/37 max:dmax_positive [OK]
TEST 3/37 max:smax_negative [OK]
TEST 4/37 min:smin_zero [OK]
TEST 5/37 min:dmin_positive [OK]
TEST 6/37 min:smin_negative [OK]
TEST 7/37 amax:damax [OK]
TEST 8/37 amax:samax [OK]
TEST 9/37 ismax:negative_step_2 [OK]
TEST 10/37 ismax:positive_step_2 [OK]
TEST 11/37 ismin:negative_step_2 [OK]
TEST 12/37 ismin:positive_step_2 [OK]
TEST 13/37 drotmg:drotmg_D1_big_D2_big_flag_zero [OK]
TEST 14/37 drotmg:rotmg_D1eqD2_X1eqX2 [OK]
TEST 15/37 drotmg:rotmg_issue1452 [OK]
TEST 16/37 drotmg:rotmg [OK]
TEST 17/37 axpy:caxpy_inc_0 [OK]
TEST 18/37 axpy:saxpy_inc_0 [OK]
TEST 19/37 axpy:zaxpy_inc_0 [OK]
TEST 20/37 axpy:daxpy_inc_0 [OK]
TEST 21/37 zdotu:zdotu_offset_1 [OK]
TEST 22/37 zdotu:zdotu_n_1 [OK]
TEST 23/37 dsdot:dsdot_n_1 [OK]
TEST 24/37 swap:cswap_inc_0 [OK]
TEST 25/37 swap:sswap_inc_0 [OK]
TEST 26/37 swap:zswap_inc_0 [OK]
TEST 27/37 swap:dswap_inc_0 [OK]
TEST 28/37 rot:csrot_inc_0 [OK]
TEST 29/37 rot:srot_inc_0 [OK]
TEST 30/37 rot:zdrot_inc_0 [OK]
TEST 31/37 rot:drot_inc_0 [OK]
TEST 32/37 dnrm2:dnrm2_tiny [OK]
TEST 33/37 dnrm2:dnrm2_inf [OK]
TEST 34/37 potrf:smoketest_trivial [OK]
TEST 35/37 potrf:bug_695 [OK]
TEST 36/37 kernel_regress:skx_avx [FAIL]
ERR: test_kernel_regress.c:50 expected 0.000e+00, got 6.734e+01
(diff -6.734e+01, tol 1.000e-10)
TEST 37/37 fork:safety_after_fork_in_parent [OK]
RESULTS: 37 tests (36 ok, 1 failed, 0 skipped) ran in 3 ms
make[1]: *** [Makefile:52: run_test] Error 1
make[1]: Leaving directory
'/dev/shm/OpenBLAS/0.3.21/GCC-12.2.0/OpenBLAS-0.3.21/utest'
make: *** [Makefile:150: tests] Error 2
(at easybuild/tools/run.py:681 in parse_cmd_output)
== 2023-09-25 16:13:04,292 build_log.py:267 INFO ... (took 12 secs)
== 2023-09-25 16:13:04,292 filetools.py:2012 INFO Removing lock
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.21-GCC-12.2.0.lock...
== 2023-09-25 16:13:04,293 filetools.py:383 INFO Path
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.21-GCC-12.2.0.lock successfully removed.
== 2023-09-25 16:13:04,293 filetools.py:2016 INFO Lock removed:
/home/modules/software/.locks/_home_modules_software_OpenBLAS_0.3.21-GCC-12.2.0.lock
== 2023-09-25 16:13:04,293 easyblock.py:4277 WARNING build failed
(first 300 chars): cmd " make tests BINARY='64' CC='gcc'
FC='gfortran' MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1' "
exited with exit code 2 and output:
/home/modules/software/binutils/2.39-GCCcore-12.2.0/bin/ld: warning:
/tmp/eb-74m3kzgo/ccy1Gkzg.o: missing .note.GNU-stack section implies
executable stack
/
== 2023-09-25 16:13:04,293 easyblock.py:328 INFO Closing log for
application name OpenBLAS version 0.3.21
Thanks,
Ole