Re: [PATCHv2][GCC] Optimise the fpclassify builtin to perform integer operations when possible

Tamar Christina Mon, 17 Oct 2016 02:06:19 -0700

Ping

________________________________________
From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf 
of Tamar Christina <tamar.christ...@arm.com>
Sent: Friday, September 30, 2016 2:22:35 PM
To: GCC Patches
Cc: nd; Richard Earnshaw; Wilco Dijkstra; ja...@redhat.com; Joseph Myers; 
Michael Meissner; rguent...@suse.de; Moritz Klammler; Andrew Pinski; 
l...@redhat.com
Subject: [PATCHv2][GCC] Optimise the fpclassify builtin to perform integer 
operations when possible


Hi All,

This is v2 of the patch which adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.

I have addressed most comments from everyone except for two things:

1) Providing a back-end hook to override the functionality. While certainly
   possible the current fpclassify doesn't provide this either. So I'd like to
   treat it as an enhancement rather than an issue.

2) Doing it in a lowering phase. If the general consensus is that this is the
   path the patch must take then I'd be happy to reconsider. However at this
   this patch does not seem to produce worse code than what there was before.

The goal is to make it faster by:
1. Trying to determine the most common case first
   (e.g. the float is a Normal number) and then the
   rest. The amount of code generated at -O2 are
   about the same +/- 1 instruction, but the code
   is much better.
2. Using integer operation in the optimized path.

At a high level, the optimized path uses integer operations
to perform the following:

  if (exponent bits aren't all set or unset)
     return Normal;
  else if (no bits are set on the number after masking out
           sign bits then)
     return Zero;
  else if (exponent has no bits set)
     return Subnormal;
  else if (mantissa has no bits set)
     return Infinite;
  else
     return NaN;

In case the optimization can't be applied the old
implementation is used as a fall-back.

A limitation with this new approach is that the exponent
of the floating point has to fit in 31 bits and the floating
point has to have an IEEE like format and values for NaN and INF
(e.g. for NaN and INF all bits of the exp must be set).

To determine this IEEE likeness a new boolean was added to real_format.

As an example, Aarch64 now generates for classification of doubles:

f:
        fmov    x1, d0
        mov     w0, 7
        sbfx    x2, x1, 52, 11
        add     w3, w2, 1
        tst     w3, 0x07FE
        bne     .L1
        mov     w0, 13
        tst     x1, 0x7fffffffffffffff
        beq     .L1
        mov     w0, 11
        tbz     x2, 0, .L1
        tst     x1, 0xfffffffffffff
        mov     w0, 3
        mov     w1, 5
        csel    w0, w0, w1, ne

.L1:
        ret

No new tests as there are existing tests to test functionality.
glibc benchmarks ran against the builtin and this shows a 42.5%
performance gain on Aarch64.

Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi
and no regression. x86 also has no regressions and modest gains (3%).

Ok for trunk?

Thanks,
Tamar

gcc/
2016-08-25  Tamar Christina  <tamar.christ...@arm.com>
            Wilco Dijkstra  <wilco.dijks...@arm.com>

        * gcc/builtins.c (fold_builtin_fpclassify): Added optimized version.
        * gcc/real.h (real_format): Added is_ieee_compatible field.
        * gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
        (mips_single_format): Likewise.
        (motorola_single_format): Likewise.
        (spu_single_format): Likewise.
        (ieee_double_format): Likewise.
        (mips_double_format): Likewise.
        (motorola_double_format): Likewise.
        (ieee_extended_motorola_format): Likewise.
        (ieee_extended_intel_128_format): Likewise.
        (ieee_extended_intel_96_round_53_format): Likewise.
        (ibm_extended_format): Likewise.
        (mips_extended_format): Likewise.
        (ieee_quad_format): Likewise.
        (mips_quad_format): Likewise.
        (vax_f_format): Likewise.
        (vax_d_format): Likewise.
        (vax_g_format): Likewise.
        (decimal_single_format): Likewise.
        (decimal_quad_format): Likewise.
        (iee_half_format): Likewise.
        (mips_single_format): Likewise.
        (arm_half_format): Likewise.
        (real_internal_format): Likewise.


gcc/testsuite/
2016-09-27  Tamar Christina  <tamar.christ...@arm.com>

        * gcc.target/aarch64/builtin-fpclassify.c: New codegen test.

Re: [PATCHv2][GCC] Optimise the fpclassify builtin to perform integer operations when possible

Reply via email to