On 9/4/24 12:55, Jan Hubicka wrote:

On 9/3/24 15:07, Jan Hubicka wrote:

Hi,
We disable gathers for zen4.  It seems that gather has improved a bit compared
to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when
the indices are known ahead of time. Vector loads followed by shuffles result
in a higher load bandwidth." however the situation seems to be more
complicated.

A small bit of "real world" experience (but for zen3):

Recently I switched to gfortran 14.2 for my weather forecasting.
A year ago I had changed "-march=native -mtune=native" (on my zen3 system)
to "-march=native -mtune=znver2" while using gfortran 13 - it had only a
small effect (but positive).

Last Monday I switched back to "-march=native -mtune=native", but that
consistently made a 12 hour computation around 6 minutes slower (i.e., about
1/120th, or 0.8 %). The most computational intensive part of the code needs
gather (either instructions or inline expansions of them).

It would be nice to know what is causing this. Gathers can be enabled
using -mtune-ctrl=use_gather and I would be happy to know about real
world situations where they help.

Ah - one detail that I forgot to mention: our code is "special" in the sense that it uses 32-bit floats while it runs on 64-bit address space.

So its use of gather instructions is rather suboptimal, needing 2 gather instructions for each actual "gather operation".

Hope this helps,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands

Reply via email to