Hi,
I was investigating a performance issue with Neon intrinsics and
realized this needed to happen.
Patch 1/3 does this. I've special cased the ffast-math case for the
_f32 intrinsics to prevent the auto-vectorizer from coming along and
vectorizing addv2sf and addv4sf type operations which we don't want to
happen by default. Patch 1/3 causes apparent "regressions" in the rather
ineffective neon intrinsics tests that we currently carry soon hopefully
to be replaced by Christophe Lyon's rewrite that is being reviewed. On
the whole I deem this patch stack to be safe to go in if necessary.
These "regressions" are for -O0 with the vbic and vorn intrinsics which
don't now get combined and well, so be it.
This then left us in the happy position of being able to delete code
but I was worried about LTO streaming as these "builtins" are
essentially streamed out in LTO object code format. However since we
make no promises about LTO compatibility across releases, that's safe
but I structured the dead code elimination as Patch 2/3. This will be
committed separately in case folks want to backport Patch 1/3 separately
and want to assure their users of LTO compatibility within a release
branch (if that even works :) ) .
Patch 3/3 removes the ML to generate Neon intrinsics and the
documentation and updates the comments in the files to show that these
are now hand crafted rather than auto-generated. We've had these for
many years now and I think it's time we got rid of this. Not everyone
groks ML and it doesn't help that only one or 2 folks can actually do
this properly everytime. Instead of having these bottlenecks and given
the fact that the intrinsics are pretty stable now, there's no point in
retaining the generator interface. I'd rather get rid of them. The only
bit left is neon-schedgen.ml, neon.ml and neon-testgen.ml. I think we
can safely remove neon-testgen.ml once Christophe's testsuite is done
and we'll probably just have to carry neon-schedgen.ml / neon.ml as it
still generates the neon descriptions for both a8 and a9.
The patch stack was caught up in the C++ type info mess recently and
I've tested this on a cross arm-linux-gnueabihf testsuite run and it
looks ok module the issues mentioned for Patch 1/3. I've deliberately
resisted deleting the entire gcc.target/arm/neon and neon-testgen.ml in
the hope that Christophe's testsuite will do the honours at that point
:). Given we're in stage 1 and that I think we're getting some where
with clyon's testsuite I feel that is reasonably practical in just
carrying the noise with these extra failures. Christophe and I will
testdrive his testsuite work in this space with these patches to see how
the conversion process works and if there are any issues with these patches.
If there are issues I'm happy to hear about them.
Will apply to trunk in a couple of days if no regressions with clyon's
testsuite for these intrinsics.
regards
Ramana
--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.