On Thu, 3 Aug 2023, Matthew Malcomson wrote:
> >
> > I think this is undesriable. With fused you mean we use FMA?
> > I think you could use -ffp-contract=off for the TU instead.
> >
> > Note you can't use __attribute__((noinline)) literally since the
> > host compiler might not support this.
> >
> > Richard.
> >
>
>
> Trying to make the timevar store integral time intervals.
> Hope this is acceptable -- I had originally planned to use
> `-ffp-contract` as agreed until I saw the email mentioning the old x86
> bug in the same area which was not to do with floating point contraction
> of operations (PR 99903) and figured it would be better to try and solve
> both at the same time while making things in general a bit more robust.
> ________________________________________________________________________
>
>
> On some AArch64 bootstrapped builds, we were getting a flaky test
> because the floating point operations in `get_time` were being fused
> with the floating point operations in `timevar_accumulate`.
>
> This meant that the rounding behaviour of our multiplication with
> `ticks_to_msec` was different when used in `timer::start` and when
> performed in `timer::stop`. These extra inaccuracies led to the
> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
>
> ------------------------------
> Avoiding the inlining which was agreed to be undesirable. Three
> alternative approaches:
> 1) Use `-ffp-contract=on` to avoid this particular optimisation.
> 2) Adjusting the code so that the "tolerance" is always of the order of
> a "tick".
> 3) Recording times and elapsed differences in integral values.
> - Could be in terms of a standard measurement (e.g. nanoseconds or
> microseconds).
> - Could be in terms of whatever integral value ("ticks" /
> secondsµseconds / "clock ticks") is returned from the syscall
> chosen at configure time.
>
> While `-ffp-contract=on` removes the problem that I bumped into, there
> has been a similar bug on x86 that was to do with a different floating
> point problem that also happens after `get_time` and
> `timevar_accumulate` both being inlined into the same function. Hence
> it seems worth choosing a different approach.
>
> Of the two other solutions, recording measurements in integral values
> seems the most robust against slightly "off" measurements being
> presented to the user -- even though it could avoid the ICE that creates
> a flaky test.
>
> I considered storing time in whatever units our syscall returns and
> normalising them at the time we print out rather than normalising them
> to nanoseconds at the point we record our "current time". The logic
> being that normalisation could have some rounding affect (e.g. if
> TICKS_PER_SECOND is 3) that would be taken into account in calculations.
>
> I decided against it in order to give the values recorded in
> `timevar_time_def` some interpretive value so it's easier to read the
> code. Compared to the small rounding that would represent a tiny amount
> of time and AIUI can not trigger the same kind of ICE's as we are
> attempting to fix, said interpretive value seems more valuable.
>
> Recording time in microseconds seemed reasonable since all obvious
> values for ticks and `getrusage` are at microsecond granularity or less
> precise. That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are both
> variables given to use by the host system I was not sure of that enough
> to make this decision.
>
> ------------------------------
> timer::all_zero is ignoring rows which are inconsequential to the user
> and would be printed out as all zeros. Since upon printing rows we
> convert to the same double value and print out the same precision as
> before, we return true/false based on the same amount of time as before.
>
> timer::print_row casts to a floating point measurement in units of
> seconds as was printed out before.
>
> timer::validate_phases -- I'm printing out nanoseconds here rather than
> floating point seconds since this is an error message for when things
> have "gone wrong" printing out the actual nanoseconds that have been
> recorded seems like the best approach.
> N.b. since we now print out nanoseconds instead of floating point value
> the padding requirements are different. Originally we were padding to
> 24 characters and printing 18 decimal places. This looked odd with the
> now visually smaller values getting printed. I judged 13 characters
> (corresponding to 2 hours) to be a reasonable point at which our
> alignment could start to degrade and this provides a more compact output
> for the majority of cases (checked by triggering the error case via
> GDB).
>
> ------------------------------
> N.b. I use a literal 1000000000 for "NANOSEC_PER_SEC". I believe this
> would fit in an integer on all hosts that GCC supports, but am not
> certain there are not strange integer sizes we support hence am pointing
> it out for special attention during review.
>
> ------------------------------
> No expected change in generated code.
> Bootstrapped and regtested on AArch64 with no regressions.
> Manually checked that flaky test is no longer flaky on the machine it
> was seen before.
The patch looks OK to me if it passes bootstrap / regtest and the
output of -ftime-report doesn't change (too much).
Thanks,
Richard.
> gcc/ChangeLog:
>
> PR middle-end/110316
> PR middle-end/9903
> * timevar.cc (NANOSEC_PER_SEC, TICKS_TO_NANOSEC,
> CLOCKS_TO_NANOSEC, nanosec_to_floating_sec, percent_of): New.
> (TICKS_TO_MSEC, CLOCKS_TO_MSEC): Remove these macros.
> (timer::validate_phases): Use integral arithmetic to check
> validity.
> (timer::print_row, timer::print): Convert from integral
> nanoseconds to floating point seconds before printing.
> (timer::all_zero): Change limit to nanosec count instead of
> fractional count of seconds.
> * timevar.h (struct timevar_time_def): Update all measurements
> to use uint64_t nanoseconds rather than seconds stored in a
> double.
>
>
>
> ############### Attachment also inlined for ease of reply
> ###############
>
>
> diff --git a/gcc/timevar.h b/gcc/timevar.h
> index
> ad46573160957c6461190204c5181ec9e904e204..882954a527e11a90873f25a45e9073149709dd05
> 100644
> --- a/gcc/timevar.h
> +++ b/gcc/timevar.h
> @@ -44,7 +44,7 @@
> */
>
> /* This structure stores the various varieties of time that can be
> - measured. Times are stored in seconds. The time may be an
> + measured. Times are stored in nanoseconds. The time may be an
> absolute time or a time difference; in the former case, the time
> base is undefined, except that the difference between two times
> produces a valid time difference. */
> @@ -52,14 +52,13 @@
> struct timevar_time_def
> {
> /* User time in this process. */
> - double user;
> + uint64_t user;
>
> - /* System time (if applicable for this host platform) in this
> - process. */
> - double sys;
> + /* System time (if applicable for this host platform) in this process. */
> + uint64_t sys;
>
> /* Wall clock time. */
> - double wall;
> + uint64_t wall;
>
> /* Garbage collector memory. */
> size_t ggc_mem;
> diff --git a/gcc/timevar.cc b/gcc/timevar.cc
> index
> d695297aae7f6b2a6de01a37fe86c2a232338df0..c6094b1c336c30e5a404ff8fd5c79795edbd59f7
> 100644
> --- a/gcc/timevar.cc
> +++ b/gcc/timevar.cc
> @@ -42,7 +42,7 @@ struct tms
> # define RUSAGE_SELF 0
> #endif
>
> -/* Calculation of scale factor to convert ticks to microseconds.
> +/* Calculation of scale factor to convert ticks to seconds.
> We mustn't use CLOCKS_PER_SEC except with clock(). */
> #if HAVE_SYSCONF && defined _SC_CLK_TCK
> # define TICKS_PER_SECOND sysconf (_SC_CLK_TCK) /* POSIX 1003.1-1996 */
> @@ -91,14 +91,15 @@ struct tms
> the underlying constants, and that can be very slow, so we have to
> precompute them. Whose wonderful idea was it to make all those
> _constants_ variable at run time, anyway? */
> +#define NANOSEC_PER_SEC 1000000000
> #ifdef USE_TIMES
> -static double ticks_to_msec;
> -#define TICKS_TO_MSEC (1 / (double)TICKS_PER_SECOND)
> +static uint64_t ticks_to_nanosec;
> +#define TICKS_TO_NANOSEC (NANOSEC_PER_SEC / TICKS_PER_SECOND)
> #endif
>
> #ifdef USE_CLOCK
> -static double clocks_to_msec;
> -#define CLOCKS_TO_MSEC (1 / (double)CLOCKS_PER_SEC)
> +static uint64_t clocks_to_nanosec;
> +#define CLOCKS_TO_NANOSEC (NANOSEC_PER_SEC / CLOCKS_PER_SEC)
> #endif
>
> /* Non-NULL if timevars should be used. In GCC, this happens with
> @@ -222,18 +223,20 @@ get_time (struct timevar_time_def *now)
> {
> #ifdef USE_TIMES
> struct tms tms;
> - now->wall = times (&tms) * ticks_to_msec;
> - now->user = tms.tms_utime * ticks_to_msec;
> - now->sys = tms.tms_stime * ticks_to_msec;
> + now->wall = times (&tms) * ticks_to_nanosec;
> + now->user = tms.tms_utime * ticks_to_nanosec;
> + now->sys = tms.tms_stime * ticks_to_nanosec;
> #endif
> #ifdef USE_GETRUSAGE
> struct rusage rusage;
> getrusage (RUSAGE_SELF, &rusage);
> - now->user = rusage.ru_utime.tv_sec + rusage.ru_utime.tv_usec * 1e-6;
> - now->sys = rusage.ru_stime.tv_sec + rusage.ru_stime.tv_usec * 1e-6;
> + now->user = rusage.ru_utime.tv_sec * NANOSEC_PER_SEC
> + + rusage.ru_utime.tv_usec * 1000;
> + now->sys = rusage.ru_stime.tv_sec * NANOSEC_PER_SEC
> + + rusage.ru_stime.tv_usec * 1000;
> #endif
> #ifdef USE_CLOCK
> - now->user = clock () * clocks_to_msec;
> + now->user = clock () * clocks_to_nanosec;
> #endif
> }
> }
> @@ -271,10 +274,10 @@ timer::timer () :
> /* Initialize configuration-specific state.
> Ideally this would be one-time initialization. */
> #ifdef USE_TIMES
> - ticks_to_msec = TICKS_TO_MSEC;
> + ticks_to_nanosec = TICKS_TO_NANOSEC;
> #endif
> #ifdef USE_CLOCK
> - clocks_to_msec = CLOCKS_TO_MSEC;
> + clocks_to_nanosec = CLOCKS_TO_NANOSEC;
> #endif
> }
>
> @@ -583,12 +586,11 @@ timer::validate_phases (FILE *fp) const
> {
> unsigned int /* timevar_id_t */ id;
> const timevar_time_def *total = &m_timevars[TV_TOTAL].elapsed;
> - double phase_user = 0.0;
> - double phase_sys = 0.0;
> - double phase_wall = 0.0;
> + uint64_t phase_user = 0;
> + uint64_t phase_sys = 0;
> + uint64_t phase_wall = 0;
> size_t phase_ggc_mem = 0;
> static char phase_prefix[] = "phase ";
> - const double tolerance = 1.000001; /* One part in a million. */
>
> for (id = 0; id < (unsigned int) TIMEVAR_LAST; ++id)
> {
> @@ -607,26 +609,32 @@ timer::validate_phases (FILE *fp) const
> }
> }
>
> - if (phase_user > total->user * tolerance
> - || phase_sys > total->sys * tolerance
> - || phase_wall > total->wall * tolerance
> - || phase_ggc_mem > total->ggc_mem * tolerance)
> + if (phase_user > total->user
> + || phase_sys > total->sys
> + || phase_wall > total->wall
> + || phase_ggc_mem > total->ggc_mem)
> {
>
> fprintf (fp, "Timing error: total of phase timers exceeds total
> time.\n");
> if (phase_user > total->user)
> - fprintf (fp, "user %24.18e > %24.18e\n", phase_user, total->user);
> + fprintf (fp, "user %13" PRIu64 " > %13" PRIu64 "\n",
> + phase_user, total->user);
> if (phase_sys > total->sys)
> - fprintf (fp, "sys %24.18e > %24.18e\n", phase_sys, total->sys);
> + fprintf (fp, "sys %13" PRIu64 " > %13" PRIu64 "\n",
> + phase_sys, total->sys);
> if (phase_wall > total->wall)
> - fprintf (fp, "wall %24.18e > %24.18e\n", phase_wall, total->wall);
> + fprintf (fp, "wall %13" PRIu64 " > %13" PRIu64 "\n",
> + phase_wall, total->wall);
> if (phase_ggc_mem > total->ggc_mem)
> - fprintf (fp, "ggc_mem %24lu > %24lu\n", (unsigned long)phase_ggc_mem,
> + fprintf (fp, "ggc_mem %13lu > %13lu\n", (unsigned long)phase_ggc_mem,
> (unsigned long)total->ggc_mem);
> gcc_unreachable ();
> }
> }
>
> +#define nanosec_to_floating_sec(NANO) ((double)(NANO) * 1e-9)
> +#define percent_of(TOTAL, SUBTOTAL) \
> + ((TOTAL) == 0 ? 0 : ((double)SUBTOTAL / TOTAL) * 100)
> /* Helper function for timer::print. */
> void
> @@ -640,22 +648,22 @@ timer::print_row (FILE *fp,
> #ifdef HAVE_USER_TIME
> /* Print user-mode time for this process. */
> fprintf (fp, "%7.2f (%3.0f%%)",
> - elapsed.user,
> - (total->user == 0 ? 0 : elapsed.user / total->user) * 100);
> + nanosec_to_floating_sec (elapsed.user),
> + percent_of (total->user, elapsed.user));
> #endif /* HAVE_USER_TIME */
>
> #ifdef HAVE_SYS_TIME
> /* Print system-mode time for this process. */
> fprintf (fp, "%7.2f (%3.0f%%)",
> - elapsed.sys,
> - (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100);
> + nanosec_to_floating_sec (elapsed.sys),
> + percent_of (total->sys, elapsed.sys));
> #endif /* HAVE_SYS_TIME */
>
> #ifdef HAVE_WALL_TIME
> /* Print wall clock time elapsed. */
> fprintf (fp, "%7.2f (%3.0f%%)",
> - elapsed.wall,
> - (total->wall == 0 ? 0 : elapsed.wall / total->wall) * 100);
> + nanosec_to_floating_sec (elapsed.wall),
> + percent_of (total->wall, elapsed.wall));
> #endif /* HAVE_WALL_TIME */
>
> /* Print the amount of ggc memory allocated. */
> @@ -673,7 +681,8 @@ timer::print_row (FILE *fp,
> bool
> timer::all_zero (const timevar_time_def &elapsed)
> {
> - const double tiny = 5e-3;
> + /* 5000000 nanosec == 5e-3 seconds. */
> + uint64_t tiny = 5000000;
> return (elapsed.user < tiny
> && elapsed.sys < tiny
> && elapsed.wall < tiny
> @@ -766,13 +775,13 @@ timer::print (FILE *fp)
> /* Print total time. */
> fprintf (fp, " %-35s:", "TOTAL");
> #ifdef HAVE_USER_TIME
> - fprintf (fp, "%7.2f ", total->user);
> + fprintf (fp, "%7.2f ", nanosec_to_floating_sec (total->user));
> #endif
> #ifdef HAVE_SYS_TIME
> - fprintf (fp, "%8.2f ", total->sys);
> + fprintf (fp, "%8.2f ", nanosec_to_floating_sec (total->sys));
> #endif
> #ifdef HAVE_WALL_TIME
> - fprintf (fp, "%8.2f ", total->wall);
> + fprintf (fp, "%8.2f ", nanosec_to_floating_sec (total->wall));
> #endif
> fprintf (fp, PRsa (7) "\n", SIZE_AMOUNT (total->ggc_mem));
>
> @@ -790,6 +799,8 @@ timer::print (FILE *fp)
>
> validate_phases (fp);
> }
> +#undef nanosec_to_floating_sec
> +#undef percent_of
>
> /* Get the name of the topmost item. For use by jit for validating
> inputs to gcc_jit_timer_pop. */
>
>
>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)