Re: [PATCH v10] Practical improvement to libgcc complex divide

2021-04-20 Thread Joseph Myers
On Wed, 7 Apr 2021, Patrick McGehearty via Gcc-patches wrote:

> +   macro_name = XALLOCAVEC (char, name_len
> ++ sizeof ("__LIBGCC__EXCESS_PRECISION__"));
> sprintf (macro_name, "__LIBGCC_%s_EXCESS_PRECISION__", name);
> builtin_define_with_int_value (macro_name, excess_precision);
> +
> +   char val_name[64];
> +
> +   macro_name = XALLOCAVEC (char, name_len
> ++ sizeof ("__LIBGCC_EPSILON__"));
> +   sprintf (macro_name, "__LIBGCC_%s_EPSILON__", name);
> +   sprintf (val_name, "__%s_EPSILON__", float_h_prefix);
> +   builtin_define_with_value (macro_name, val_name, 0);
> +
> +   macro_name = XALLOCAVEC (char, name_len + sizeof ("__LIBGCC_MAX__"));
> +   sprintf (macro_name, "__LIBGCC_%s_MAX__", name);
> +   sprintf (val_name, "__%s_MAX__", float_h_prefix);
> +   builtin_define_with_value (macro_name, val_name, 0);
> +
> +   macro_name = XALLOCAVEC (char, name_len + sizeof ("__LIBGCC_MIN__"));
> +   sprintf (macro_name, "__LIBGCC_%s_MIN__", name);
> +   sprintf (val_name, "__%s_MIN__", float_h_prefix);
> +   builtin_define_with_value (macro_name, val_name, 0);

I think there's an off-by-one error calculating the allocation sizes for 
these three macro names.  Note that the code just above uses

  sizeof ("__LIBGCC__EXCESS_PRECISION__")

with two underscores between "__LIBGCC" and "EXCESS_PRECISION__", 
reflecting that the macro name being constructed has both those 
underscores (around the %s expansion of size name_len).  I think the three 
sizeof calls in the three subsequent allocations likewise need to have 
both those underscores present in their arguments.

> diff --git a/libgcc/config/rs6000/_divkc3.c b/libgcc/config/rs6000/_divkc3.c
> index d261f40..f7fa47f 100644
> --- a/libgcc/config/rs6000/_divkc3.c
> +++ b/libgcc/config/rs6000/_divkc3.c
> @@ -37,29 +37,115 @@ see the files COPYING3 and COPYING.RUNTIME respectively. 
>  If not, see
>  #define __divkc3 __divkc3_sw
>  #endif
>  
> +#define RBIG   (__LIBGCC_TF_MAX__ / 2)
> +#define RMIN   (__LIBGCC_TF_MIN__)
> +#define RMIN2  (__LIBGCC_TF_EPSILON__)
> +#define RMINSCAL (1 / __LIBGCC_TF_EPSILON__)
> +#define RMAX2  (RBIG * RMIN2)

This file includes quad-float128.h, which does some remapping from TF to 
KF depending on __LONG_DOUBLE_IEEE128__.

I think you probably need to have a similar __LONG_DOUBLE_IEEE128__ 
conditional here.  If __LONG_DOUBLE_IEEE128__ is not defined, use 
__LIBGCC_KF_* macros instead of __LIBGCC_TF_*; if __LONG_DOUBLE_IEEE128__ 
is defined, use __LIBGCC_TF_* as above.  (Unless the powerpc maintainers 
say otherwise.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v10] Practical improvement to libgcc complex divide

2021-04-15 Thread Patrick McGehearty via Gcc-patches

- ping

[A sincere and special thanks for the sustained perseverance of the
reviewers in pointing me in the proper direction to get this patch
polished. The original proposal was June 1, 2020 and only covered
double precision. The current version is dramatically better, both
from extending coverage to most precisions, improving the computation
for accuracy and speed, and from improving the code maintainability.
- Patrick McGehearty]


On 4/7/2021 3:21 PM, Patrick McGehearty via Gcc-patches wrote:

Changes in this version from Version 9:

Replaced all uses of alloca with XALLOCAVEC in
c_cpp_builtins() in c-cppbuiltin.c
Did not replace alloca elsewhere in the same file.

Fixed type of name_len.
Fixed prototypes for abort & exit in new test programs.
Fixed spelling errors and omitted words in comments.
Changed XMTYPE to AMTYPE to avoid confusion with extended types.
Removed XCTYPE as unused. (A for additional)

Correctness and performance test programs used during development of
this project may be found in the attachment to:
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html

Summary of Purpose

This patch to libgcc/libgcc2.c __divdc3 provides an
opportunity to gain important improvements to the quality of answers
for the default complex divide routine (half, float, double, extended,
long double precisions) when dealing with very large or very small exponents.

The current code correctly implements Smith's method (1962) [2]
further modified by c99's requirements for dealing with NaN (not a
number) results. When working with input values where the exponents
are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
substantially different from the answers provided by quad precision
more than 1% of the time. This error rate may be unacceptable for many
applications that cannot a priori restrict their computations to the
safe range. The proposed method reduces the frequency of
"substantially different" answers by more than 99% for double
precision at a modest cost of performance.

Differences between current gcc methods and the new method will be
described. Then accuracy and performance differences will be discussed.

Background

This project started with an investigation related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
provided an overview of past and recent practice for computing complex
divide. The current glibc implementation is based on Robert Smith's
algorithm [2] from 1962.  A google search found the paper by Baudin
and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
proposed patch [4] is based on that paper.

I developed two sets of test data by randomly distributing values over
a restricted range and the full range of input values. The current
complex divide handled the restricted range well enough, but failed on
the full range more than 1% of the time. Baudin and Smith's primary
test for "ratio" equals zero reduced the cases with 16 or more error
bits by a factor of 5, but still left too many flawed answers. Adding
debug print out to cases with substantial errors allowed me to see the
intermediate calculations for test values that failed. I noted that
for many of the failures, "ratio" was a subnormal. Changing the
"ratio" test from check for zero to check for subnormal reduced the 16
bit error rate by another factor of 12. This single modified test
provides the greatest benefit for the least cost, but the percentage
of cases with greater than 16 bit errors (double precision data) is
still greater than 0.027% (2.7 in 10,000).

Continued examination of remaining errors and their intermediate
computations led to the various tests of input value tests and scaling
to avoid under/overflow. The current patch does not handle some of the
rare and most extreme combinations of input values, but the random
test data is only showing 1 case in 10 million that has an error of
greater than 12 bits. That case has 18 bits of error and is due to
subtraction cancellation. These results are significantly better
than the results reported by Baudin and Smith.

Support for half, float, double, extended, and long double precision
is included as all are handled with suitable preprocessor symbols in a
single source routine. Since half precision is computed with float
precision as per current libgcc practice, the enhanced algorithm
provides no benefit for half precision and would cost performance.
Further investigation showed changing the half precision algorithm
to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no
loss of precision and modest improvement in performance.

The existing constants for each precision:
float: FLT_MAX, FLT_MIN;
double: DBL_MAX, DBL_MIN;
extended and/or long double: LDBL_MAX, LDBL_MIN
are used for avoiding the more common overflow/underflow cases.  This
use is made generic by defining appropriate __LIBGCC2_* macros in
c-cppbuiltin.c.

Tests are added for when both parts of the denominator have exponents
small enough 

[PATCH v10] Practical improvement to libgcc complex divide

2021-04-07 Thread Patrick McGehearty via Gcc-patches
Changes in this version from Version 9:

Replaced all uses of alloca with XALLOCAVEC in
c_cpp_builtins() in c-cppbuiltin.c
Did not replace alloca elsewhere in the same file.

Fixed type of name_len.
Fixed prototypes for abort & exit in new test programs.
Fixed spelling errors and omitted words in comments.
Changed XMTYPE to AMTYPE to avoid confusion with extended types.
Removed XCTYPE as unused. (A for additional)

Correctness and performance test programs used during development of
this project may be found in the attachment to:
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html

Summary of Purpose

This patch to libgcc/libgcc2.c __divdc3 provides an
opportunity to gain important improvements to the quality of answers
for the default complex divide routine (half, float, double, extended,
long double precisions) when dealing with very large or very small exponents.

The current code correctly implements Smith's method (1962) [2]
further modified by c99's requirements for dealing with NaN (not a
number) results. When working with input values where the exponents
are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
substantially different from the answers provided by quad precision
more than 1% of the time. This error rate may be unacceptable for many
applications that cannot a priori restrict their computations to the
safe range. The proposed method reduces the frequency of
"substantially different" answers by more than 99% for double
precision at a modest cost of performance.

Differences between current gcc methods and the new method will be
described. Then accuracy and performance differences will be discussed.

Background

This project started with an investigation related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
provided an overview of past and recent practice for computing complex
divide. The current glibc implementation is based on Robert Smith's
algorithm [2] from 1962.  A google search found the paper by Baudin
and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
proposed patch [4] is based on that paper.

I developed two sets of test data by randomly distributing values over
a restricted range and the full range of input values. The current
complex divide handled the restricted range well enough, but failed on
the full range more than 1% of the time. Baudin and Smith's primary
test for "ratio" equals zero reduced the cases with 16 or more error
bits by a factor of 5, but still left too many flawed answers. Adding
debug print out to cases with substantial errors allowed me to see the
intermediate calculations for test values that failed. I noted that
for many of the failures, "ratio" was a subnormal. Changing the
"ratio" test from check for zero to check for subnormal reduced the 16
bit error rate by another factor of 12. This single modified test
provides the greatest benefit for the least cost, but the percentage
of cases with greater than 16 bit errors (double precision data) is
still greater than 0.027% (2.7 in 10,000).

Continued examination of remaining errors and their intermediate
computations led to the various tests of input value tests and scaling
to avoid under/overflow. The current patch does not handle some of the
rare and most extreme combinations of input values, but the random
test data is only showing 1 case in 10 million that has an error of
greater than 12 bits. That case has 18 bits of error and is due to
subtraction cancellation. These results are significantly better
than the results reported by Baudin and Smith.

Support for half, float, double, extended, and long double precision
is included as all are handled with suitable preprocessor symbols in a
single source routine. Since half precision is computed with float
precision as per current libgcc practice, the enhanced algorithm
provides no benefit for half precision and would cost performance.
Further investigation showed changing the half precision algorithm
to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no
loss of precision and modest improvement in performance.

The existing constants for each precision:
float: FLT_MAX, FLT_MIN;
double: DBL_MAX, DBL_MIN;
extended and/or long double: LDBL_MAX, LDBL_MIN
are used for avoiding the more common overflow/underflow cases.  This
use is made generic by defining appropriate __LIBGCC2_* macros in
c-cppbuiltin.c.

Tests are added for when both parts of the denominator have exponents
small enough to allow shifting any subnormal values to normal values
all input values could be scaled up without risking overflow. That
gained a clear improvement in accuracy. Similarly, when either
numerator was subnormal and the other numerator and both denominator
values were not too large, scaling could be used to reduce risk of
computing with subnormals.  The test and scaling values used all fit
within the allowed exponent range for each precision required by the C
standard.

Float precision has more diffic