On Sat, 16 May 2026 at 17:45, Tom Lane <[email protected]> wrote:
>
> BTW, on the principle of "where else did we make the same mistake",
> I looked through the other aggregates using float8_regr_accum.
> Most seem okay, but float8_regr_intercept does this:
>
>         PG_RETURN_FLOAT8((Sy - Sx * Sxy / Sxx) / N);
>
> Seems to me that expression is also prone to internal
> overflow/underflow.  Underflow probably isn't a huge issue,
> since the result will reduce to Sy/N which is likely to be good
> enough.  But can we do anything about overflow?
>
> One simple change that might make things better is to compute
>
>         PG_RETURN_FLOAT8((Sy - Sx * (Sxy / Sxx)) / N);
>
> on the theory that the sums of products are likely to both be large.

Hmm, that isn't necessarily better. For example, with this data:

WITH t(x,y) AS (
  SELECT 1e-155 + g*1e-160, 1e155 + g*1e150
    FROM generate_series(1,10) g
)
SELECT sum(x::float8) sx, sum(y::float8) sy,
       regr_sxx(y,x), regr_syy(y,x), regr_sxy(y,x),
       regr_intercept(y,x)
  FROM t;

           sx            |      sy       |   regr_sxx   |
regr_syy        |       regr_sxy        |     regr_intercept
-------------------------+---------------+--------------+------------------------+-----------------------+-------------------------
 1.0000550000000001e-154 | 1.000055e+156 | 8.24996e-319 |
8.249999999970085e+301 | 8.249999999965278e-09 |
-5.144448004587567e+149
(1 row)

The current regr_intercept() code works fine, but if you were to
attempt to calculate Sxy / Sxx first, it would overflow.

I think probably the least likely to overflow computation would be Sxy
* (Sx / Sxx), because Sxx is likely to be very large/small whenever Sx
is, so Sx / Sxx seems unlikely to overflow. There may well be examples
disproving that theory too though, so maybe it needs to try multiple
orderings.

Regards,
Dean


Reply via email to