[Bug libgcc/108279] Improved speed for float128 routines

wilco at gcc dot gnu.org via Gcc-bugs Wed, 18 Jan 2023 11:02:33 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279


Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilco at gcc dot gnu.org

--- Comment #18 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Michael_S from comment #12)

> This set of options does not map too well into real difficulties of
> implementation.
> There are only 2 things that are expensive:
> 1. Inexact Exception
> 2. Fetching of the current rounding mode.
> The rest of IEEE-754 features is so cheap that creating separate variants
> without them simply is not worth the effort of maintaining distinct
> variants, even if all difference is a single three-lines #ifdef

In general reading the current rounding mode is relatively cheap, but modifying
can be expensive, so optimized fenv implementations in GLIBC only modify the FP
status if a change is required. It should be feasible to check for
round-to-even and use optimized code for that case.

> BTW, Inexact Exception can be made fairly affordable with a little help from
> compiler. All we need for that is ability to say "don't remove this floating
> point addition even if you don't see that it produces any effect".
> Something similar to 'volatile', but with volatile compiler currently puts
> result of addition on stack, which adds undesirable cost.
> However, judged by comment of Jakub, compiler maintainers are not
> particularly interested in this enterprise.

There are macros in GLIBC math-barriers.h which do what you want - eg. AArch64:

#define math_opt_barrier(x)                                     \
  ({ __typeof (x) __x = (x); __asm ("" : "+w" (__x)); __x; })
#define math_force_eval(x)                                              \
  ({ __typeof (x) __x = (x); __asm __volatile__ ("" : : "w" (__x)); })

The first blocks optimizations (like constant folding) across the barrier, the
2nd forces evaluation of an expression even if it is deemed useless. These are
used in many math functions in GLIBC. They are target specific due to needing
inline assembler operands, but it should be easy to add similar definitions to
libgcc.

[Bug libgcc/108279] Improved speed for float128 routines

Reply via email to