[Bug libgcc/108279] New: Improved speed for float128 routines

tkoenig at gcc dot gnu.org via Gcc-bugs Tue, 03 Jan 2023 12:55:39 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279


            Bug ID: 108279
           Summary: Improved speed for float128 routines
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: libgcc
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tkoenig at gcc dot gnu.org
  Target Milestone: ---

Our soft-float routines, which are used for the basic float128 arithmetic
(__addtf3, __subtf3, etc) are much slower than they need to be.

Michael S has some routines which are considerably faster, at
https://github.com/already5chosen/extfloat, which he would like to
contribute to gcc.  There is a rather lengthy thread in comp.arch
starting with https://groups.google.com/g/comp.arch/c/Izheu-k00Nw .

Current status of the discussion:

The routines currently do not support rounding modes, they support round to
nearest with tie even only. Adding such support would be feasible.

Handling the rounding mode it is currently done in libgcc, by
querying the hardware, leading to a high overhead for each
call. This would not be needed if -ffast-math (or a relevant
suboption) is specified.

It would also be suitable as is (with a different name) for Fortran
intrinsics such as matmul.

Fortran is a bit special because rounding modes are default on procedure
entry and are restored on procedure exit (which is why setting rounding
modes in a subroutine is a no-op). This would allow to keep a local
variable keeping track of the rounding mode.

The current idea would be something like this:

The current behavior of __addtf3 and friends could remain as is,
but its speed could be improved,. but it would still query the
hardware.

There can be two additional routines for each arithmetic operation. One
of them would implement the operation given a specified rounding mode
(to be called from Fortran when the correct IEEE module is in
use).

The other one would just implement round-to-nearest, for use from
Fortran intrinsics and from all other languages if the right flags
are given. It would be good to bolt this onto some flag which is
used for libgfortran, to make it accessible from C.

Probably gcc14 material.

[Bug libgcc/108279] New: Improved speed for float128 routines

Reply via email to