https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113005

Thomas Schwinge <tschwinge at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|libfortran                  |testsuite
   Last reconfirmed|                            |2023-12-21
             Target|powerpc64le-linux-gnu       |
                 CC|                            |burnus at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
Turns out, this isn't actually specific to powerpc64le-linux-gnu, but rather
the following: my testing where I saw the timeouts was not build-tree 'make
check' testing, but instead "installed" testing (where you invoke 'runtest' on
a 'make install'ed GCC tree).  In that case, r266482 "Tweak libgomp env vars in
parallel make check (take 2)" is not in effect, that is, there's no limiting to
'OMP_NUM_THREADS=8'.

For example, manually running the '-O0' variant of
'libgomp.fortran/rwlock_1.f90' on a "big-iron" x86_64-pc-linux-gnu system:

    $ grep ^model\ name < /proc/cpuinfo | uniq -c
        256 model name  : AMD EPYC 7V13 64-Core Processor
    $ \time env OMP_NUM_THREADS=[...] LD_LIBRARY_PATH=[...] ./rwlock_1.exe

..., I produce the following data on an idle system:

'OMP_NUM_THREADS=8':

    0.16user 0.56system 0:02.36elapsed 31%CPU (0avgtext+0avgdata
4452maxresident)k
    0.17user 0.54system 0:02.30elapsed 30%CPU (0avgtext+0avgdata
4532maxresident)k

'OMP_NUM_THREADS=16':

    0.40user 1.03system 0:04.52elapsed 31%CPU (0avgtext+0avgdata
5832maxresident)k
    0.49user 0.99system 0:04.39elapsed 33%CPU (0avgtext+0avgdata
5876maxresident)k

'OMP_NUM_THREADS=32':

    0.98user 2.36system 0:09.33elapsed 35%CPU (0avgtext+0avgdata
8528maxresident)k
    0.98user 2.25system 0:09.02elapsed 35%CPU (0avgtext+0avgdata
8548maxresident)k

'OMP_NUM_THREADS=64':

    1.82user 5.83system 0:18.44elapsed 41%CPU (0avgtext+0avgdata
13952maxresident)k
    1.54user 6.03system 0:18.22elapsed 41%CPU (0avgtext+0avgdata
13996maxresident)k

'OMP_NUM_THREADS=128':

    3.71user 12.41system 0:38.02elapsed 42%CPU (0avgtext+0avgdata
24376maxresident)k
    3.96user 12.52system 0:39.34elapsed 41%CPU (0avgtext+0avgdata
24476maxresident)k

'OMP_NUM_THREADS=256' (or not set, for that matter):

    9.65user 25.19system 1:20.93elapsed 43%CPU (0avgtext+0avgdata
45816maxresident)k
    8.99user 25.82system 1:19.40elapsed 43%CPU (0avgtext+0avgdata
45636maxresident)k

For comparison, if I remove 'LD_LIBRARY_PATH', such that the system-wide GCC 10
libraries are used, I get for the latter case:

    9.28user 24.54system 1:22.09elapsed 41%CPU (0avgtext+0avgdata
45588maxresident)k
    11.26user 24.51system 1:24.32elapsed 42%CPU (0avgtext+0avgdata
45712maxresident)k

..., so only a little bit of an improvement of the new "rwlock" libgfortran vs.
old "mutex" GCC 10 one, curiously.  (But supposedly that depends on the
hardware or other factors?)

Anyway: should these test cases be limiting themselves to some lower
'OMP_NUM_THREADS', for example via 'num_threads' clauses?

The powerpc64le-linux-gnu systems:

    $ grep ^cpu < /proc/cpuinfo | uniq -c

    160 cpu             : POWER8 (raw), altivec supported

    152 cpu             : POWER8NVL (raw), altivec supported

    128 cpu             : POWER9, altivec supported

Reply via email to