Am 17.11.2016 um 00:20 schrieb Jakub Jelinek:
On Thu, Nov 17, 2016 at 12:03:18AM +0100, Thomas Koenig wrote:
Don't you need to test in configure if the assembler supports AVX?
Otherwise if somebody is bootstrapping gcc with older assembler, it will
just fail to bootstrap.
That's a good point. The AVX instructions were added in binutils 2.19,
which was released in 2011. This could be put in the prerequisites.
What should the test do? Fail with an error message "you need newer
binutils" or simply (and silently) not compile the AVX vesion?
From what I understood, you want those functions just to be implementation
details, not exported from libgfortran.so*. Thus the test would do
something similar to what gcc/testsuite/lib/target-supports.exp
(check_effective_target_avx)
does, but of course in autoconf way, not in tcl.
OK, that looks straightworward enough. I'll give it a shot.
Also, from what I see, target_clones just use IFUNCs, so you probably also
need some configure test whether ifuncs are supported (the
gcc.target/i386/mvc* tests use dg-require-ifunc, so you'd need something
similar again in configure. But if so, then I have no idea why you use
a wrapper around the function, instead of using it on the exported APIs.
As you wrote above, I wanted this as an implementation detail. I also
wanted the ability to be able to add new instruction sets without
breaking the ABI.
Because the caller generates the ifunc, using a wrapper function seemed
like the best way to do it. The overhead is neglible (the function
is one simple jump), especially considering that we only call the
library function for larger matrices.
For matmul_i*, wouldn't it make more sense to use avx2 instead of avx,
or both avx and avx2 and maybe avx512f?
I did a vdiff of the disassembled code generated or avx and avx2, and
(somewhat to my surprise) there was no difference. Maybe, with more
unrolling, something more might have happened. I didn't check for
AVX512f, but I can do that.
For the float/double code it wouldn't surprise me (assuming you don't need
gather insns and similar stuff). But for integers generally most of the
avx instructions can only handle 128-bit vectors, while avx2 has 256-bit
ones,
You're right - integer multiplication looks different.
Nobody I know cares about integer matrix multiplication
speed, whereas real has gotten a _lot_ of attention over
the decades. So, putting in AVX will make the code run
faster on more machines, while putting in AVX2 will
(IMHO) bloat the library for no good reason. However,
I am willing to stand corrected on this. Putting in AVX512f
makes sense.
I have also been trying to get target_clones to work on POWER
to get Altivec instructions, but to no avail. I also cannot
find any examples in the testsuite.
Since a lot of supercomputers use POWER nodes, that might also
be attractive.
Regards
Thomas