Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00232.html

Thanks,
Kyrill
On 04/02/15 12:12, Kyrill Tkachov wrote:
Hi all,

This patch improves the vc<cond> patterns in neon.md to use proper RTL
operations rather than UNSPECS.
It is done in a similar way to the analogous aarch64 operations i.e.
vceq is expressed as
(neg (eq (...) (...)))
since we want to write all 1s to the result element when 'eq' holds and
0s otherwise.

The catch is that the floating-point comparisons can only be expanded to
the RTL codes when -funsafe-math-optimizations is given and they must
continue to use the UNSPECS otherwise.
For this I've created a define_expand that generates
the correct RTL depending on -funsafe-math-optimizations and two
define_insns to match the result: one using the RTL codes and one using
UNSPECs.

I've also compressed some of the patterns together using iterators for
the [eq gt ge le lt] cases.
NOTE: for le and lt before this patch we would never generate
'vclt.<type> dm, dn, dp' instructions, only 'vclt.<type> dm, dn, #0'.
With this patch we can now generate 'vclt.<type> dm, dn, dp' assembly.
According to the ARM ARM this is just a pseudo-instruction that mapps to
vcgt with the operands swapped around.
I've confirmed that gas supports this code.

The vcage and vcagt patterns are rewritten to use the form:
(neg
    (<cond>
      (abs (...))
      (abs (...))))

and condensed together using iterators as well.

Bootstrapped and tested on arm-none-linux-gnueabihf, made sure that the
advanced-simd-intrinsics testsuite is passing
(it did catch some bugs during development of this patch) and tried out
other NEON intrinsics codebases.

The test gcc.target/arm/neon/pr51534.c now generates 'vclt.<type> dn,
dm, #0' instructions where appropriate instead of the previous vmov of
#0 into a temp and then a 'vcgt.<type> dn, temp, dm'.
I think that is correct behaviour since the test was trying to make sure
that we didn't generate a .u<size>-typed comparison with #0, which is
what the PR was talking about (from what I can gather).

What do people think of this approach?
I'm proposing this for next stage1, of course.

Thanks,
Kyrill


2015-02-04  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

      * config/arm/iterators.md (GTGE, GTUGEU, COMPARISONS): New code
      iterators.
      (cmp_op, cmp_type): New code attributes.
      (NEON_VCMP, NEON_VACMP): New int iterators.
      (cmp_op_unsp): New int attribute.
      * config/arm/neon.md (neon_vc<cmp_op><mode>): New define_expand.
      (neon_vceq<mode>): Delete.
      (neon_vc<cmp_op><mode>_insn): New pattern.
      (neon_vc<cmp_op_unsp><mode>_insn_unspec): Likewise.
      (neon_vcgeu<mode>): Delete.
      (neon_vcle<mode>): Likewise.
      (neon_vclt<mode>: Likewise.
      (neon_vcage<mode>): Likewise.
      (neon_vcagt<mode>): Likewise.
      (neon_vca<cmp_op><mode>): New define_expand.
      (neon_vca<cmp_op><mode>_insn): New pattern.
      (neon_vca<cmp_op_unsp><mode>_insn_unspec): Likewise.

2015-02-04  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

      * gcc.target/arm/neon/pr51534.c: Update vcg* scan-assembly patterns
      to look for vcl* where appropriate.

Reply via email to