Ping ________________________________________ From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf of Tamar Christina <tamar.christ...@arm.com> Sent: Friday, September 30, 2016 2:22:35 PM To: GCC Patches Cc: nd; Richard Earnshaw; Wilco Dijkstra; ja...@redhat.com; Joseph Myers; Michael Meissner; rguent...@suse.de; Moritz Klammler; Andrew Pinski; l...@redhat.com Subject: [PATCHv2][GCC] Optimise the fpclassify builtin to perform integer operations when possible
Hi All, This is v2 of the patch which adds an optimized route to the fpclassify builtin for floating point numbers which are similar to IEEE-754 in format. I have addressed most comments from everyone except for two things: 1) Providing a back-end hook to override the functionality. While certainly possible the current fpclassify doesn't provide this either. So I'd like to treat it as an enhancement rather than an issue. 2) Doing it in a lowering phase. If the general consensus is that this is the path the patch must take then I'd be happy to reconsider. However at this this patch does not seem to produce worse code than what there was before. The goal is to make it faster by: 1. Trying to determine the most common case first (e.g. the float is a Normal number) and then the rest. The amount of code generated at -O2 are about the same +/- 1 instruction, but the code is much better. 2. Using integer operation in the optimized path. At a high level, the optimized path uses integer operations to perform the following: if (exponent bits aren't all set or unset) return Normal; else if (no bits are set on the number after masking out sign bits then) return Zero; else if (exponent has no bits set) return Subnormal; else if (mantissa has no bits set) return Infinite; else return NaN; In case the optimization can't be applied the old implementation is used as a fall-back. A limitation with this new approach is that the exponent of the floating point has to fit in 31 bits and the floating point has to have an IEEE like format and values for NaN and INF (e.g. for NaN and INF all bits of the exp must be set). To determine this IEEE likeness a new boolean was added to real_format. As an example, Aarch64 now generates for classification of doubles: f: fmov x1, d0 mov w0, 7 sbfx x2, x1, 52, 11 add w3, w2, 1 tst w3, 0x07FE bne .L1 mov w0, 13 tst x1, 0x7fffffffffffffff beq .L1 mov w0, 11 tbz x2, 0, .L1 tst x1, 0xfffffffffffff mov w0, 3 mov w1, 5 csel w0, w0, w1, ne .L1: ret No new tests as there are existing tests to test functionality. glibc benchmarks ran against the builtin and this shows a 42.5% performance gain on Aarch64. Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi and no regression. x86 also has no regressions and modest gains (3%). Ok for trunk? Thanks, Tamar gcc/ 2016-08-25 Tamar Christina <tamar.christ...@arm.com> Wilco Dijkstra <wilco.dijks...@arm.com> * gcc/builtins.c (fold_builtin_fpclassify): Added optimized version. * gcc/real.h (real_format): Added is_ieee_compatible field. * gcc/real.c (ieee_single_format): Set is_ieee_compatible flag. (mips_single_format): Likewise. (motorola_single_format): Likewise. (spu_single_format): Likewise. (ieee_double_format): Likewise. (mips_double_format): Likewise. (motorola_double_format): Likewise. (ieee_extended_motorola_format): Likewise. (ieee_extended_intel_128_format): Likewise. (ieee_extended_intel_96_round_53_format): Likewise. (ibm_extended_format): Likewise. (mips_extended_format): Likewise. (ieee_quad_format): Likewise. (mips_quad_format): Likewise. (vax_f_format): Likewise. (vax_d_format): Likewise. (vax_g_format): Likewise. (decimal_single_format): Likewise. (decimal_quad_format): Likewise. (iee_half_format): Likewise. (mips_single_format): Likewise. (arm_half_format): Likewise. (real_internal_format): Likewise. gcc/testsuite/ 2016-09-27 Tamar Christina <tamar.christ...@arm.com> * gcc.target/aarch64/builtin-fpclassify.c: New codegen test.