On Mon, 24 Jun 2019, Richard Biener wrote:

-frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
on" covering the whole program.

For constant expressions, I see a difference between
constexpr double third = 1. / 3.;
which really needs to be done at compile time, and
const double third = 1. / 3.;
which will try to evaluate the rhs as constexpr, but where the program is
still valid if that fails. The second one clearly should refuse to be
evaluated at compile time if we are specifying a dynamic rounding
direction. For the first one, I am not sure. I guess you should only write
that in "fenv_access off" regions and I wouldn't mind a compile error.

Note that C2x adds a pragma fenv_round that specifies a rounding direction
for a region of code, which seems relevant for constant expressions. That
pragma looks hard, but maybe some pieces would be nice to add.

Hmm.  My thinking was along the line that at the start of main() the
C abstract machine might specify the initial rounding mode (and exception
state) is implementation defined and all constant expressions are evaluated
whilst being in this state.  So we can define that to round-to-nearest and
simply fold all constants in contexts we are allowed to evaluate at
compile-time as we see them?

There are way too many such contexts. In C++, any initializer is
constexpr-evaluated if possible (PR 85746 shows that this is bad for
__builtin_constant_p), and I do want
double d = 1. / 3;
to depend on the dynamic rounding direction. I'd rather err on the other
extreme and only fold when we are forced to, say
constexpr double d = 1. / 3;
or even reject it because it is inexact, if pragmas put us in a region
with dynamic rounding.

OK, fair enough.  I just hoped that global

double x = 1.0/3.0;

do not become runtime initializers with -frounding-math ...

Ah, I wasn't thinking of globals. Ignoring the new pragma fenv_round, which I guess could affect this (the C draft isn't very explicit), the program doesn't have many chances to set a rounding mode before initializing globals. It could do so in the initializer of another variable, but relying on the order of initialization this way seems bad. Maybe in this case it would make sense to assume the default rounding mode...

In practice, I would only set -frounding-math on a per function basis
(possibly using pragma fenv_access), so the optimization of what happens
to globals doesn't seem so important.

Side remark, I am sad that Intel added rounded versions for scalars and
512 bit vectors but not for intermediate sizes, while I am most
interested in 128 bits. Masking most of the 512 bits still causes the
dreaded clock slow-down.

Ick.  I thought this was vector-length agnostic...

I think all of the new stuff in AVX512 is, except rounding...

Also, the rounded functions have exceptions disabled, which may make
them hard to use with fenv_access.

I guess builtins need the same treatment for -ftrapping-math as they
do for -frounding-math.  I think you already mentioned the default
of this flag doesn't make much sense (well, the flag isn't fully
honored/implemented).

PR 54192
(coincidentally, it caused a missed vectorization in
https://stackoverflow.com/a/56681744/1918193 last week)

I commented there.  Lets just make -frounding-math == FENV_ACCESS ON
and keep -ftrapping-math as whether FP exceptions raise traps.

One issue is that the C pragmas do not let me convey that I am interested in dynamic rounding but not exception flags. It is possible to optimize quite a bit more with just rounding. In particular, the functions are pure (at some point we will have to teach the compiler the difference between the FP environment and general memory, but I'd rather wait).

Yeah.  Auto-vectorizing would also need adjustment of course (also
costing like estimate_num_insns or others).

Anything that is only about optimizing the code in -frounding-math
functions can wait, that's the good point of implementing a new feature.

--
Marc Glisse

Reply via email to