On Mon, 9 Nov 2020 at 17:21, Étienne Mollier <etienne.moll...@mailoo.org> wrote:
> Howdy, > > I'm filling a wishlist item in the bug tracker, so that the > discussion does not disappear inside mail archives. I gave a > try to shapeit4 autopkgtest suite with and without FMA & AVX2 > support, but it had a run time of 1m25s in both cases on my > machine (Ryzen 5 3600 w/ 6 cores). It is quite possible I > neglected some other bottlenecks though, but the assembler did > embed AVX2 instructions when I checked the build result. Out of > curiosity, has someone figures on the performance gain for that > software when extensions are available? > > Michael R. Crusoe, on 2020-11-05 21:26:30 +0100: > > As documented at > > https://wiki.debian.org/SIMDEverywhere > > shapeit4 provides a dedicated code path for "-mfma -mavx2" build > options, and another one for generic builds. Is it still worth > using SIMDe in this particular situation? The "use case" > paragraph of the wiki page seems to suggest it is not strictly > needed here. > Given the fallback route that doesn't use SIMD, then implementing our own is not necessary, however compiling the FMA+AVX2 path using SIMDe on non-x86 archs may result in a speedup for them. Would be best to get a bigger training dataset to confirm the benefit, or at least the lack of regression :-) If a performance benefit is observed, it might be interesting to see if the AVX-only and "lower" SIMD levels on x86 also experience a speed up.