Ping. https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00232.html
Thanks, Kyrill On 04/02/15 12:12, Kyrill Tkachov wrote:
Hi all, This patch improves the vc<cond> patterns in neon.md to use proper RTL operations rather than UNSPECS. It is done in a similar way to the analogous aarch64 operations i.e. vceq is expressed as (neg (eq (...) (...))) since we want to write all 1s to the result element when 'eq' holds and 0s otherwise. The catch is that the floating-point comparisons can only be expanded to the RTL codes when -funsafe-math-optimizations is given and they must continue to use the UNSPECS otherwise. For this I've created a define_expand that generates the correct RTL depending on -funsafe-math-optimizations and two define_insns to match the result: one using the RTL codes and one using UNSPECs. I've also compressed some of the patterns together using iterators for the [eq gt ge le lt] cases. NOTE: for le and lt before this patch we would never generate 'vclt.<type> dm, dn, dp' instructions, only 'vclt.<type> dm, dn, #0'. With this patch we can now generate 'vclt.<type> dm, dn, dp' assembly. According to the ARM ARM this is just a pseudo-instruction that mapps to vcgt with the operands swapped around. I've confirmed that gas supports this code. The vcage and vcagt patterns are rewritten to use the form: (neg (<cond> (abs (...)) (abs (...)))) and condensed together using iterators as well. Bootstrapped and tested on arm-none-linux-gnueabihf, made sure that the advanced-simd-intrinsics testsuite is passing (it did catch some bugs during development of this patch) and tried out other NEON intrinsics codebases. The test gcc.target/arm/neon/pr51534.c now generates 'vclt.<type> dn, dm, #0' instructions where appropriate instead of the previous vmov of #0 into a temp and then a 'vcgt.<type> dn, temp, dm'. I think that is correct behaviour since the test was trying to make sure that we didn't generate a .u<size>-typed comparison with #0, which is what the PR was talking about (from what I can gather). What do people think of this approach? I'm proposing this for next stage1, of course. Thanks, Kyrill 2015-02-04 Kyrylo Tkachov <kyrylo.tkac...@arm.com> * config/arm/iterators.md (GTGE, GTUGEU, COMPARISONS): New code iterators. (cmp_op, cmp_type): New code attributes. (NEON_VCMP, NEON_VACMP): New int iterators. (cmp_op_unsp): New int attribute. * config/arm/neon.md (neon_vc<cmp_op><mode>): New define_expand. (neon_vceq<mode>): Delete. (neon_vc<cmp_op><mode>_insn): New pattern. (neon_vc<cmp_op_unsp><mode>_insn_unspec): Likewise. (neon_vcgeu<mode>): Delete. (neon_vcle<mode>): Likewise. (neon_vclt<mode>: Likewise. (neon_vcage<mode>): Likewise. (neon_vcagt<mode>): Likewise. (neon_vca<cmp_op><mode>): New define_expand. (neon_vca<cmp_op><mode>_insn): New pattern. (neon_vca<cmp_op_unsp><mode>_insn_unspec): Likewise. 2015-02-04 Kyrylo Tkachov <kyrylo.tkac...@arm.com> * gcc.target/arm/neon/pr51534.c: Update vcg* scan-assembly patterns to look for vcl* where appropriate.