Re: svn commit: r265472 - head/bin/dd

2014-05-08 Thread Bruce Evans

On Wed, 7 May 2014, Alan Somers wrote:


On Tue, May 6, 2014 at 9:47 PM, Bruce Evans b...@optusnet.com.au wrote:

On Tue, 6 May 2014, Alan Somers wrote:


This is about some minor details that I didn't reply to for later followups.


+   if (clock_gettime(CLOCK_MONOTONIC_PRECISE, tv))
+   err(EX_OSERR, clock_gettime);
+   if (clock_getres(CLOCK_MONOTONIC_PRECISE, tv_res))
+   err(EX_OSERR, clock_getres);



clock_getres() is almost useless, and is useless here.  It is broken
as designed, since the precision may be less than 1 nanosecond but
1 nanosecond is the smallest positive representable value, but that
is not a problem here since clock_gettime() also can't distinguish
differences smaller than 1 nanosecond.


Since it's reporting the clock resolution and not precision, and since
clock_gettime() only reports with 1ns resolution, I don't think it's a
problem for clock_getres to report with 1ns resolution too.


I got most of the backwardness backwards.  The syscall is clock_getres(),
not clock_getprec(), and the variable name matches this.  But what it
returns is the precision.  The resolution is just that of a timespec (1
nanosecond).  No API is needed to report this.  APIs are needed to report:
- the precision.  The API is misnamed clock_getres()
- the granularity.  This is the minimum time between successive measurements.
   It can be determined by actually doing some measurements.
- the accuracy.  No API is available
For clocks based on timecounters, we use time timecounter clock period
rounded up to nanoseconds for the precision.  With a TSC, this is always
1 nanosecond above 1GHz.

dd needs more like the granularity than the precision, but it doesn't
really matter since the runtime must be much larger than the granularity
for the statistics to be accurate, and usually is.


The fixup is now only reachable in 3 cases that can't happen:
- when the monotonic time goes backwards due to a kernel bug
- when the monotonic time doesn't increase, so that the difference is 0.
  Oops, this can happen for timecounters with very low precision.
  You don't need to know the precision to check for this.


On my Xeon E5504 systems, I can see adjacent calls to clock_gettime
return equal values when using one of the _FAST clocks.  It can't be
proven that this case will never happen with any other clock either,
so the program needs to handle it.


Hrmph.  This is either from the design error of the existence of the
_FAST clocks, or from the design error of the existence of TSC-low.

First, the _FAST clocks are only supposed to have a resolution of 1/hz.
clock_getres() is quite broken here.  It returns the timecounter
precision for the _FAST clocks too.  Also, if it returned 1/hz, then
it would be inconsistent with the libc implementation of
clock_gettime().  The latter gives gives the timecounter precision.

TSC-low intentionally destroys the hardware TSC precision by right shifting,
due to FUD and to handle a minor problem above 4GHz.  The shift used to
be excessive in most cases.  On freefall it used to be about 7, so the
precision of ~1/2.67 nsec was reduced to ~48 nsec.  This was easy to
see in tests programs for such things.  Now the shift is just 1.
Since 11 is less than 2.67, the loss of precision from the shift is
less than the loss of precision from converting from bintimes to
timespecs.  The shift is still a pessimization.

sysctl has a read-only tunable kern.timecounter.tsc_shift.  Use of this
seems to be quite broken.  The shift count is determined dynamically,
and the tunable barely effects this.  The active shift count is not
written back to the tunable, so you can't see what it is easy.  However,
the shift count is now always 1 except in exceptional cases.  The tunable
defaults to 1.  This is for CPU speeds between 2GHz and 4GHz to implement
the support for the FUD at these speeds.  Above 4GHz, the shift is increased
to 2 without changing the tunable.  Above 8GHz, the shift is increased to
3.  That can't happen yet, but you can tune higher to get a higher
shift count at lower speeds.  You can also tune to 0 to avoid the shift
up to 4GHz.

The shift together with some fencing pessimizations that are not even done
in the kernel version (only libc) is due to FUD.  rdtsc is not a serializing
instruction, so its direct use may give surprising results.  I think it
is serialized with respect to itself on the same CPU.  It is obviously
not serialized with respect to other instructions on the same CPU.  So
it doesn't work properly in code like rdtsc; save results; v++; rdtsc;
compare results even with quite a bit more than v++ between the rdtsc's.
Normally there is much more than v++ between rdtsc's so code like this
works well in practice.  When the rdtsc's are on separate CPUs, it is
just a bug to depend on their order unless there are synchronization
instructions for more than the rdtsc's.

The old kernel code is sloppy about such things.  It tries to do
everything without 

Re: svn commit: r265472 - head/bin/dd

2014-05-08 Thread Bruce Evans

On Wed, 7 May 2014, Alan Somers wrote:


On Tue, May 6, 2014 at 9:47 PM, Bruce Evans b...@optusnet.com.au wrote:

On Tue, 6 May 2014, Alan Somers wrote:


This is about some minor details that I didn't reply to for later followups.


+   if (clock_gettime(CLOCK_MONOTONIC_PRECISE, tv))
+   err(EX_OSERR, clock_gettime);
+   if (clock_getres(CLOCK_MONOTONIC_PRECISE, tv_res))
+   err(EX_OSERR, clock_getres);



clock_getres() is almost useless, and is useless here.  It is broken
as designed, since the precision may be less than 1 nanosecond but
1 nanosecond is the smallest positive representable value, but that
is not a problem here since clock_gettime() also can't distinguish
differences smaller than 1 nanosecond.


Since it's reporting the clock resolution and not precision, and since
clock_gettime() only reports with 1ns resolution, I don't think it's a
problem for clock_getres to report with 1ns resolution too.


I got most of the backwardness backwards.  The syscall is clock_getres(),
not clock_getprec(), and the variable name matches this.  But what it
returns is the precision.  The resolution is just that of a timespec (1
nanosecond).  No API is needed to report this.  APIs are needed to report:
- the precision.  The API is misnamed clock_getres()
- the granularity.  This is the minimum time between successive measurements.
  It can be determined by actually doing some measurements.
- the accuracy.  No API is available
For clocks based on timecounters, we use time timecounter clock period
rounded up to nanoseconds for the precision.  With a TSC, this is always
1 nanosecond above 1GHz.

dd needs more like the granularity than the precision, but it doesn't
really matter since the runtime must be much larger than the granularity
for the statistics to be accurate, and usually is.


The fixup is now only reachable in 3 cases that can't happen:
- when the monotonic time goes backwards due to a kernel bug
- when the monotonic time doesn't increase, so that the difference is 0.
  Oops, this can happen for timecounters with very low precision.
  You don't need to know the precision to check for this.


On my Xeon E5504 systems, I can see adjacent calls to clock_gettime
return equal values when using one of the _FAST clocks.  It can't be
proven that this case will never happen with any other clock either,
so the program needs to handle it.


Hrmph.  This is either from the design error of the existence of the
_FAST clocks, or from the design error of the existence of TSC-low.

First, the _FAST clocks are only supposed to have a resolution of 1/hz.
clock_getres() is quite broken here.  It returns the timecounter
precision for the _FAST clocks too.  Also, if it returned 1/hz, then
it would be inconsistent with the libc implementation of
clock_gettime().  The latter gives gives the timecounter precision.

TSC-low intentionally destroys the hardware TSC precision by right shifting,
due to FUD and to handle a minor problem above 4GHz.  The shift used to
be excessive in most cases.  On freefall it used to be about 7, so the
precision of ~1/2.67 nsec was reduced to ~48 nsec.  This was easy to
see in tests programs for such things.  Now the shift is just 1.
Since 11 is less than 2.67, the loss of precision from the shift is
less than the loss of precision from converting from bintimes to
timespecs.  The shift is still a pessimization.

sysctl has a read-only tunable kern.timecounter.tsc_shift.  Use of this
seems to be quite broken.  The shift count is determined dynamically,
and the tunable barely effects this.  The active shift count is not
written back to the tunable, so you can't see what it is easy.  However,
the shift count is now always 1 except in exceptional cases.  The tunable
defaults to 1.  This is for CPU speeds between 2GHz and 4GHz to implement
the support for the FUD at these speeds.  Above 4GHz, the shift is increased
to 2 without changing the tunable.  Above 8GHz, the shift is increased to
3.  That can't happen yet, but you can tune higher to get a higher
shift count at lower speeds.  You can also tune to 0 to avoid the shift
up to 4GHz.

The shift together with some fencing pessimizations that are not even done
in the kernel version (only libc) is due to FUD.  rdtsc is not a serializing
instruction, so its direct use may give surprising results.  I think it
is serialized with respect to itself on the same CPU.  It is obviously
not serialized with respect to other instructions on the same CPU.  So
it doesn't work properly in code like rdtsc; save results; v++; rdtsc;
compare results even with quite a bit more than v++ between the rdtsc's.
Normally there is much more than v++ between rdtsc's so code like this
works well in practice.  When the rdtsc's are on separate CPUs, it is
just a bug to depend on their order unless there are synchronization
instructions for more than the rdtsc's.

The old kernel code is sloppy about such things.  It tries to do
everything without 

Re: svn commit: r265472 - head/bin/dd

2014-05-08 Thread Alan Somers
On Wed, May 7, 2014 at 9:39 PM, Bruce Evans b...@optusnet.com.au wrote:
 On Wed, 7 May 2014, Jilles Tjoelker wrote:

 On Wed, May 07, 2014 at 12:10:31PM -0600, Alan Somers wrote:

 On Tue, May 6, 2014 at 9:47 PM, Bruce Evans b...@optusnet.com.au wrote:

 On Tue, 6 May 2014, Alan Somers wrote:

 ...

  The solution is to use clock_gettime(2) with CLOCK_MONOTONIC_PRECISE
 as
 the
  clock_id.  That clock advances steadily, regardless of changes to the
 system
  clock.
 ...
 +#include sysexits.h


 Use of sysexits.h is a style bug.  It is not used in BSD or KNF code
 like dd used to be.


 sysexits.h is recommended by the err(3) man page.  Is that
 recommendation meant to apply selectively, or is it obsolete, or is
 some sort of edit war being waged by man page authors?


 Bug in the err(3) man page.  Sort of an edit war.  Just 2 FreeBSD
 committers liked sysexits and used it in their code and added a
 recommendation to use it in some man pages.  But it has negative
 advantages, and normal BSD programs don't use it.  It has been
 edited in and out of style(9).


 The recommendation for sysexits.h was incompletely removed, yes.


 It is still in err(3), and sysexits(3) still justifies itself by
 pointing to partly-removed words in style(9).

 err(3) is the last place that should recommend using sysexits.  err()
 gives a nice way of encouraging text descriptions for all exits.
 With text descriptions, there is almost no need for cryptic numeric
 exit codes.  Only sets of programs that communicate a little status
 in the exit code should use sysexits (or perhaps their own exit
 codes, or certain standard exit codes like 126 or 127 for xargs and
 some other utilities).  Some of the uses of the standard exit codes
 are even.  I don't know of any utility except possibly sendmail that
 documents that it uses sysexits enough for its exit codes to be
 useful for more than a binary success/fail decision.  Certainly not
 dd after these changes.  If its use of sysexits were documented,
 then the documentation would say dd uses sysexits to report 3 errors
 that can't happen; otherwise, it uses the normal 2-state exit codes
 (there is a macro for them.  It expands to the concise but
 grammatically challenged exits 0 on success, and 0 if an error
 occurs.  Here 0 standardises the usual sloppiness of not
 distinguishing codes between 1 and 127).

 sysexits(3) now says:

 % DESCRIPTION
 %  According to style(9), it is not a good practice to call exit(3) with
 %  arbitrary values to indicate a failure condition when ending a
 program.
 %  Instead, the pre-defined exit codes from sysexits should be used, so
 the
 %  caller of the process can get a rough estimation about the failure
 class
 %  without looking up the source code.

 but style(9) now says:

 %  Exits should be 0 on success, or 1 on failure.
 % %  exit(0);/*
 %   * Avoid obvious comments such as
 %   * Exit 0 on success.
 %   */
 %  }

 The latter is not what I asked for either.  In previous discussion
 of this, I think we agreed to at least mention EXIT_SUCCESS and
 EXIT_FAILURE, and possibly deprecate sysexits.

 This is a weakened version of the 4.4BSD style rules, which say:

 %   /*
 %* Exits should be 0 on success, and 1 on failure.  Don't denote
 %* all the possible exit points, using the integers 1 through 300.
 %*/
 %   exit(0);/* Avoid obvious comments such as Exit 0 on success.
 */

 The main point of this is to disallow cryptic undocumented exit statuses.
 Recommending sysexits almost reverses this.  It gives cryptic undocumented
 error statuses that are not even easy to decrypt for programs.  Programs
 can look up sysexits, but without documentation there is no guarantee that
 the encoding is according to sysexits.  Actually documenting use of
 sysexits would make it even more painful to use.


 [snip]

 -   st.start = tv.tv_sec + tv.tv_usec * 1e-6;
 +   if (clock_gettime(CLOCK_MONOTONIC_PRECISE, tv))
 +   err(EX_OSERR, clock_gettime);

 [snip]

 +   st.start = tv.tv_sec + tv.tv_nsec * 1.0e-9;
 }


 The floating point addition starts losing precision after 8388608
 seconds (slightly more than 97 days, a plausible uptime for a server).
 It is better to subtract the timespecs to avoid this issue.


 No, it is better to use floating point for results that only need to
 be approximate.  Especially when the inputs are approximate and the
 final approximation doesn't need to be very accurate.

 Floating point is good for all timespec and timeval calculations,
 except in the kernel where it is unavailable.  timespecs and timevals
 are mostly used for timeouts, and the kernel isn't very careful about
 exact timeouts.  Short timeouts have inherent large inaccuracy due
 to interrupt granularity and latency.  Long timeouts can be relatively
 more accurate, but only if the 

Re: svn commit: r265472 - head/bin/dd

2014-05-08 Thread Bruce Evans

On Thu, 8 May 2014, Alan Somers wrote:


On Wed, May 7, 2014 at 9:39 PM, Bruce Evans b...@optusnet.com.au wrote:

On Wed, 7 May 2014, Jilles Tjoelker wrote:


On Wed, May 07, 2014 at 12:10:31PM -0600, Alan Somers wrote:


On Tue, May 6, 2014 at 9:47 PM, Bruce Evans b...@optusnet.com.au wrote:


On Tue, 6 May 2014, Alan Somers wrote:

[snip]


+   st.start = tv.tv_sec + tv.tv_nsec * 1.0e-9;
}



The floating point addition starts losing precision after 8388608
seconds (slightly more than 97 days, a plausible uptime for a server).
It is better to subtract the timespecs to avoid this issue.


No, it is better to use floating point for results that only need to
be approximate.  Especially when the inputs are approximate and the
final approximation doesn't need to be very accurate.

Floating point is good for all timespec and timeval calculations,
except in the kernel where it is unavailable.  timespecs and timevals
are mostly used for timeouts, and the kernel isn't very careful about
exact timeouts.  Short timeouts have inherent large inaccuracy due
to interrupt granularity and latency.  Long timeouts can be relatively
more accurate, but only if the kernel is careful about them.  It is
only careful in some places.


No, Jilles is right.  The problem isn't that dd uses doubles; it's
that dd converts longs to doubles _before_ subtracting the values.
That causes rounding if the tv_sec values are large.  If the
implementation of CLOCK_MONOTONIC ever changed to measure time since
the Epoch, or something similar, then the rounding error would be
extremely significant.  Better to subtract the timespecs, then convert
to double.


Nowhere near a problem.  Conversion from long to double is exact up
to 2**53 seconds or 285 megayears.  So it works for seconds since the
POSIX Epoch although not for seconds since the big bang.  It would be
a problem for floats.

Actually, I didn't notice the fragile conversion order.


With microseconds, the precision of a double is sufficient for 272
years, so that calculation is probably acceptable.


Actually 285 years.  A measly 0.285 years in nanoseconds before exactness
is lost.  The subtraction problem also affects tv_nsec, but is not
very serious since it wouldn't hurt to ignore the nanoseconds part of
a runtime of 0.285 years.  We are 54 years from the Epoch now.  That is
a bit more than 0.285, so we lose a significant part of the nanoseconds
precision.  But we get microseconds precision for 285 years since the
Epoch.  About 5 times that now.


...
Bugs in the boot time can be fixed more easily than by micro-adjusting
the monotonic clock.  Just keep the initial boot time (except adjust it
when it was initially local instead of UTC) and frob the real time
using a different variable.  Export both variables so that applications
can compensate for the frobbing at the cost of some complexity.  E.g.,
in uptime(1):

clock_gettime(CLOCK_UPTIME, ts);
/*
 * Actually, do the compensations in the kernel for CLOCK_UPTIME.
 * It doesn't need to be monotonic.  But suppose it is the same
 * as the unfixed CLOCK_MONOTONIC and compensate here.
 *
 * Also fix the bogus variable name 'tp'.
 */
sysctl_mumble(boottime);
sysctl_mumble(frobbed_boottime);
uptime = ts.tv_sec +- (boottime.tv_sec - frobbed_boottime.tv_sec);

Note that the compensation may go backwards, so this method doesn't work
in general for monotonic times.  However, it can be used if the compensation
is non-negative or relatively small negative.  dd could use this method.
It already has to fix up for zero times and still has parts of the old
method that fixes up for negative times.  Note that the compensation may
be very large across a suspension.  You might start dd, SIGSTOP it, suspend
the system and restart everything a day later.  The compensation would be
about 1 day.  The average from this wouldn't be very useful, but it would
be the same as if dd was stopped for a day but the system was not suspended.


Wouldn't it be simpler just for the kernel to adjust CLOCK_MONOTONIC
to add suspension time?


That works for forward jumps like ones for suspension.

BTW, I do a similar adjustment for suspension that is actually sitting
in ddb.  I only step the real time.  My accuracy is better than 10
ppm.  Good enough for an ntp server.  The stepping is done in rtcintr()
on seconds rollover interrupts 3 seconds after ddb has exited and
remained inactive, occurding to measurements made every 64 seconds
rollover interrupt (the delay is to prevent adjustments while single
stepping).

Nothing much notices this stepping, but for monotonic time you have
to do something about scheduled timeouts.  Strictly, stepping the
monotonic clock forward by a lot should cause many timeouts to expire.
This is the correct behaviour, but it won't happen automatially, and
it would cause thundering herds.  Cron should have similar scheduling
problems after the realtime clock 

Re: svn commit: r265472 - head/bin/dd

2014-05-08 Thread Dmitry Morozovsky
On Wed, 7 May 2014, Alan Somers wrote:

[snip]

 Even if nanosecond resolution isn't useful, monotonicity is.  Nobody
 should be using a nonmonotonic clock just to measure durations.  I
 started an audit of all of FreeBSD to look for other programs that use
 gettimeofday to measure durations.  I haven't finished, but I've
 already found a lot, including xz, ping, hastd, fetch, systat, powerd,
 and others.  I don't have time to fix them, though.  Would you be
 interested, or do you know anyone else who would?

From your list, hastd seems to be the most dangerous point to me.

Adding trociny@ as one of the most active hastd-related committers so the issue 
would not be lost in the areas...

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r265472 - head/bin/dd

2014-05-08 Thread Alan Somers
On Thu, May 8, 2014 at 4:55 PM, Dmitry Morozovsky ma...@rinet.ru wrote:
 On Wed, 7 May 2014, Alan Somers wrote:

 [snip]

 Even if nanosecond resolution isn't useful, monotonicity is.  Nobody
 should be using a nonmonotonic clock just to measure durations.  I
 started an audit of all of FreeBSD to look for other programs that use
 gettimeofday to measure durations.  I haven't finished, but I've
 already found a lot, including xz, ping, hastd, fetch, systat, powerd,
 and others.  I don't have time to fix them, though.  Would you be
 interested, or do you know anyone else who would?

 From your list, hastd seems to be the most dangerous point to me.

 Adding trociny@ as one of the most active hastd-related committers so the 
 issue
 would not be lost in the areas...

In hastd's case, the problematic comparison is only used to print
debugging info.  It happens twice, in primary.c lines 1978 and 2039.


 --
 Sincerely,
 D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
 [ FreeBSD committer: ma...@freebsd.org ]
 
 *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r265472 - head/bin/dd

2014-05-07 Thread Alan Somers
On Tue, May 6, 2014 at 9:47 PM, Bruce Evans b...@optusnet.com.au wrote:
 On Tue, 6 May 2014, Alan Somers wrote:

 Log:
  dd(1) uses gettimeofday(2) to compute the throughput statistics.
 However,
  gettimeofday returns the system clock, which may jump forward or back,
  especially if NTP is in use.  If the time jumps backwards, then dd will
 see
  negative elapsed time, round it up to 1usec, and print an absurdly fast
  transfer rate.

  The solution is to use clock_gettime(2) with CLOCK_MONOTONIC_PRECISE as
 the
  clock_id.  That clock advances steadily, regardless of changes to the
 system
  clock.

  Reviewed by:   delphij
  MFC after: 3 days
  Sponsored by:  Spectra Logic
 ...

 Modified: head/bin/dd/dd.c

 ==
 --- head/bin/dd/dd.cTue May  6 22:04:50 2014(r265471)
 +++ head/bin/dd/dd.cTue May  6 22:06:39 2014(r265472)
 @@ -50,7 +50,6 @@ __FBSDID($FreeBSD$);
 #include sys/conf.h
 #include sys/disklabel.h
 #include sys/filio.h
 -#include sys/time.h

 #include ctype.h
 #include err.h
 @@ -61,6 +60,8 @@ __FBSDID($FreeBSD$);
 #include stdio.h
 #include stdlib.h
 #include string.h
 +#include sysexits.h


 Use of sysexits.h is a style bug.  It is not used in BSD or KNF code
 like dd used to be.

sysexits.h is recommended by the err(3) man page.  Is that
recommendation meant to apply selectively, or is it obsolete, or is
some sort of edit war being waged by man page authors?



 +#include time.h
 #include unistd.h

 #include dd.h
 @@ -123,7 +124,7 @@ static void
 setup(void)
 {
 u_int cnt;
 -   struct timeval tv;
 +   struct timespec tv;

 if (in.name == NULL) {
 in.name = stdin;
 @@ -240,8 +241,9 @@ setup(void)
 ctab = casetab;
 }

 -   (void)gettimeofday(tv, NULL);


 Not checking for errors, and especially voiding the result to say that
 errors need not be checked for, was sloppy.


 -   st.start = tv.tv_sec + tv.tv_usec * 1e-6;
 +   if (clock_gettime(CLOCK_MONOTONIC_PRECISE, tv))
 +   err(EX_OSERR, clock_gettime);


 The existence of CLOCK_MONOTONIC_PRECISE is a design error.  It is the
 same as the standard CLOCK_MONOTONIC.  Use of the former is just
 gratuitously unportable.  It ensures a compile-time failure on OSes
 (including old versions of FreeBSD) that don't have the
 CLOCK_MONOTONIC_PRECISE mistake.  It gives a runtime error on old
 versions of FreeBSD that have clock_gettime() and CLOCK_MONOTONIC but
 not CLOCK_MONOTONIC_PRECISE.  Apart from this error, clock_gettime()
 is as likely to fail as gettimeofday(), or exit().

 The existence of CLOCK_MONOTONIC_FAST is a larger design error.  It
 is not fast, but just sloppy, broken and not very fast (sloppiness is
 the intentional part of its design).  Of course there is no need to
 use it to speed up the whole 2 clock_gettime() calls in dd.  It is
 almost useless for other applications too.  In non-broken implementations,
 the speed of CLOCK_* tends to be dominated by the syscall time even
 when the timer hardware is slow.  On freefall now, the implementation
 is about half as good as possible, and takes between 32 and 35 nanoseconds
 for all clock ids, with the least accurate clock id TIME_SECOND tieing
 with CLOCK_REALTIME for slowest (only about 1 nanosecond slower) since
 it has the worst implementation.
   (The implementation on freefall avoids syscalls provided the timer
   hardware is the TSC.  The hardware timer and binuptime() are used for
   all syscalls including TIME_SECOND.  TIME_SECOND is slowest because
   it does extra work to zero tv_sec after calculating it with maximal
   precision.  Doing it this way gives the correct time in seconds, but
   is incompatible with the kernel (syscall and kernel seconds variable)
   since the kernel calculations are sloppy.  The kernel time for seconds
   and for the sloppy FAST clock ids and for the corresponding kernel
   timestamp functions lags the precise time by up to 1/HZ seconds.  This
   difference can easily be seen from userland.

   The hardware part of the 32-35 nanoseconds is 2-4 nanoseconds.  This is
   in sloppy tests.  Reading the TSC is unserialized, and loops reading it
   may be pipelined too well to time it accurately.  Also, the times read
   from it are not completely accurate.  But clock_gettime() doesn't
 serialize
   it either.  Serializing it would make it much slower (maybe 50
 nanoseconds).

   2-4 nanoseconds is almost as fast as on AthlonXP 10 years ago!  rdtsc
   tends to be slower than that on Intel CPUs and on modern CPUs.
   Synchronization in hardware makes it slower on modern CPUs.  ISTR it
   took about 65 cycles on a previous generation of Intel CPUs.

   32-35 nanonseconds for the libc implementation is actually less than
   half as good as it possible if the hardware part really is only 2-4
   nanoseconds.  There is a lot of overhead from layering and from working
   

Re: svn commit: r265472 - head/bin/dd

2014-05-07 Thread Jilles Tjoelker
On Wed, May 07, 2014 at 12:10:31PM -0600, Alan Somers wrote:
 On Tue, May 6, 2014 at 9:47 PM, Bruce Evans b...@optusnet.com.au wrote:
  On Tue, 6 May 2014, Alan Somers wrote:
 
  Log:
   dd(1) uses gettimeofday(2) to compute the throughput statistics.
  However,
   gettimeofday returns the system clock, which may jump forward or back,
   especially if NTP is in use.  If the time jumps backwards, then dd will
  see
   negative elapsed time, round it up to 1usec, and print an absurdly fast
   transfer rate.
 
   The solution is to use clock_gettime(2) with CLOCK_MONOTONIC_PRECISE as
  the
   clock_id.  That clock advances steadily, regardless of changes to the
  system
   clock.
 
   Reviewed by:   delphij
   MFC after: 3 days
   Sponsored by:  Spectra Logic
  ...
 
  Modified: head/bin/dd/dd.c
 
  ==
  --- head/bin/dd/dd.cTue May  6 22:04:50 2014(r265471)
  +++ head/bin/dd/dd.cTue May  6 22:06:39 2014(r265472)
  @@ -50,7 +50,6 @@ __FBSDID($FreeBSD$);
  #include sys/conf.h
  #include sys/disklabel.h
  #include sys/filio.h
  -#include sys/time.h
 
  #include ctype.h
  #include err.h
  @@ -61,6 +60,8 @@ __FBSDID($FreeBSD$);
  #include stdio.h
  #include stdlib.h
  #include string.h
  +#include sysexits.h

  Use of sysexits.h is a style bug.  It is not used in BSD or KNF code
  like dd used to be.

 sysexits.h is recommended by the err(3) man page.  Is that
 recommendation meant to apply selectively, or is it obsolete, or is
 some sort of edit war being waged by man page authors?

The recommendation for sysexits.h was incompletely removed, yes.

 [snip]
  -   st.start = tv.tv_sec + tv.tv_usec * 1e-6;
  +   if (clock_gettime(CLOCK_MONOTONIC_PRECISE, tv))
  +   err(EX_OSERR, clock_gettime);
 [snip]
  +   st.start = tv.tv_sec + tv.tv_nsec * 1.0e-9;
  }

The floating point addition starts losing precision after 8388608
seconds (slightly more than 97 days, a plausible uptime for a server).
It is better to subtract the timespecs to avoid this issue.

With microseconds, the precision of a double is sufficient for 272
years, so that calculation is probably acceptable.

I think I agree with Bruce that using CLOCK_MONOTONIC_PRECISE is not
necessary here; the standard CLOCK_MONOTONIC should suffice.

 [snip]
 Even if nanosecond resolution isn't useful, monotonicity is.  Nobody
 should be using a nonmonotonic clock just to measure durations.  I
 started an audit of all of FreeBSD to look for other programs that use
 gettimeofday to measure durations.  I haven't finished, but I've
 already found a lot, including xz, ping, hastd, fetch, systat, powerd,
 and others.  I don't have time to fix them, though.  Would you be
 interested, or do you know anyone else who would?

I have a local patch for time(1).

Whether the monotonic clock is right also depends on how long the
durations typically are. For very long durations, users might refer to
wall clocks and CLOCK_REALTIME may be more appropriate.

-- 
Jilles Tjoelker
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r265472 - head/bin/dd

2014-05-07 Thread Alan Somers
On Wed, May 7, 2014 at 2:26 PM, Jilles Tjoelker jil...@stack.nl wrote:
 On Wed, May 07, 2014 at 12:10:31PM -0600, Alan Somers wrote:
 On Tue, May 6, 2014 at 9:47 PM, Bruce Evans b...@optusnet.com.au wrote:
  On Tue, 6 May 2014, Alan Somers wrote:
 
  Log:
   dd(1) uses gettimeofday(2) to compute the throughput statistics.
  However,
   gettimeofday returns the system clock, which may jump forward or back,
   especially if NTP is in use.  If the time jumps backwards, then dd will
  see
   negative elapsed time, round it up to 1usec, and print an absurdly fast
   transfer rate.
 
   The solution is to use clock_gettime(2) with CLOCK_MONOTONIC_PRECISE as
  the
   clock_id.  That clock advances steadily, regardless of changes to the
  system
   clock.
 
   Reviewed by:   delphij
   MFC after: 3 days
   Sponsored by:  Spectra Logic
  ...
 
  Modified: head/bin/dd/dd.c
 
  ==
  --- head/bin/dd/dd.cTue May  6 22:04:50 2014(r265471)
  +++ head/bin/dd/dd.cTue May  6 22:06:39 2014(r265472)
  @@ -50,7 +50,6 @@ __FBSDID($FreeBSD$);
  #include sys/conf.h
  #include sys/disklabel.h
  #include sys/filio.h
  -#include sys/time.h
 
  #include ctype.h
  #include err.h
  @@ -61,6 +60,8 @@ __FBSDID($FreeBSD$);
  #include stdio.h
  #include stdlib.h
  #include string.h
  +#include sysexits.h

  Use of sysexits.h is a style bug.  It is not used in BSD or KNF code
  like dd used to be.

 sysexits.h is recommended by the err(3) man page.  Is that
 recommendation meant to apply selectively, or is it obsolete, or is
 some sort of edit war being waged by man page authors?

 The recommendation for sysexits.h was incompletely removed, yes.

 [snip]
  -   st.start = tv.tv_sec + tv.tv_usec * 1e-6;
  +   if (clock_gettime(CLOCK_MONOTONIC_PRECISE, tv))
  +   err(EX_OSERR, clock_gettime);
 [snip]
  +   st.start = tv.tv_sec + tv.tv_nsec * 1.0e-9;
  }

 The floating point addition starts losing precision after 8388608
 seconds (slightly more than 97 days, a plausible uptime for a server).
 It is better to subtract the timespecs to avoid this issue.

 With microseconds, the precision of a double is sufficient for 272
 years, so that calculation is probably acceptable.

Good point.  I'll fix it.  BTW, FreeBSD does not have a timespecsub,
but I notice that NetBSD does.  This seems like a good candidate to
import.

http://netbsd.gw.com/cgi-bin/man-cgi?timespecsub++NetBSD-current


 I think I agree with Bruce that using CLOCK_MONOTONIC_PRECISE is not
 necessary here; the standard CLOCK_MONOTONIC should suffice.

 [snip]
 Even if nanosecond resolution isn't useful, monotonicity is.  Nobody
 should be using a nonmonotonic clock just to measure durations.  I
 started an audit of all of FreeBSD to look for other programs that use
 gettimeofday to measure durations.  I haven't finished, but I've
 already found a lot, including xz, ping, hastd, fetch, systat, powerd,
 and others.  I don't have time to fix them, though.  Would you be
 interested, or do you know anyone else who would?

 I have a local patch for time(1).

 Whether the monotonic clock is right also depends on how long the
 durations typically are. For very long durations, users might refer to
 wall clocks and CLOCK_REALTIME may be more appropriate.

I would say that gettimeofday() or CLOCK_REALTIME() should be used
whenever the absolute value of the result matters.  For example, if
it's written to disk, displayed to the user, or used as an absolute
wakeup time for something like at(1).  But if only the relative value
of 2 or more calls matter, then I think that a monotonic clock should
always be used.  A corollary is that durations that cross a reboot
cycle, or when the start and end times are measured by different
machines, must use the realtime clock.

-Alan


 --
 Jilles Tjoelker
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r265472 - head/bin/dd

2014-05-07 Thread Bruce Evans

On Wed, 7 May 2014, Jilles Tjoelker wrote:


On Wed, May 07, 2014 at 12:10:31PM -0600, Alan Somers wrote:

On Tue, May 6, 2014 at 9:47 PM, Bruce Evans b...@optusnet.com.au wrote:

On Tue, 6 May 2014, Alan Somers wrote:

...
 The solution is to use clock_gettime(2) with CLOCK_MONOTONIC_PRECISE as
the
 clock_id.  That clock advances steadily, regardless of changes to the
system
 clock.
...
+#include sysexits.h



Use of sysexits.h is a style bug.  It is not used in BSD or KNF code
like dd used to be.



sysexits.h is recommended by the err(3) man page.  Is that
recommendation meant to apply selectively, or is it obsolete, or is
some sort of edit war being waged by man page authors?


Bug in the err(3) man page.  Sort of an edit war.  Just 2 FreeBSD
committers liked sysexits and used it in their code and added a
recommendation to use it in some man pages.  But it has negative
advantages, and normal BSD programs don't use it.  It has been
edited in and out of style(9).


The recommendation for sysexits.h was incompletely removed, yes.


It is still in err(3), and sysexits(3) still justifies itself by
pointing to partly-removed words in style(9).

err(3) is the last place that should recommend using sysexits.  err()
gives a nice way of encouraging text descriptions for all exits.
With text descriptions, there is almost no need for cryptic numeric
exit codes.  Only sets of programs that communicate a little status
in the exit code should use sysexits (or perhaps their own exit
codes, or certain standard exit codes like 126 or 127 for xargs and
some other utilities).  Some of the uses of the standard exit codes
are even.  I don't know of any utility except possibly sendmail that
documents that it uses sysexits enough for its exit codes to be
useful for more than a binary success/fail decision.  Certainly not
dd after these changes.  If its use of sysexits were documented,
then the documentation would say dd uses sysexits to report 3 errors
that can't happen; otherwise, it uses the normal 2-state exit codes
(there is a macro for them.  It expands to the concise but
grammatically challenged exits 0 on success, and 0 if an error
occurs.  Here 0 standardises the usual sloppiness of not
distinguishing codes between 1 and 127).

sysexits(3) now says:

% DESCRIPTION
%  According to style(9), it is not a good practice to call exit(3) with
%  arbitrary values to indicate a failure condition when ending a program.
%  Instead, the pre-defined exit codes from sysexits should be used, so the
%  caller of the process can get a rough estimation about the failure class
%  without looking up the source code.

but style(9) now says:

%  Exits should be 0 on success, or 1 on failure.
% 
%  exit(0);/*

%   * Avoid obvious comments such as
%   * Exit 0 on success.
%   */
%  }

The latter is not what I asked for either.  In previous discussion
of this, I think we agreed to at least mention EXIT_SUCCESS and
EXIT_FAILURE, and possibly deprecate sysexits.

This is a weakened version of the 4.4BSD style rules, which say:

%   /*
%* Exits should be 0 on success, and 1 on failure.  Don't denote
%* all the possible exit points, using the integers 1 through 300.
%*/
%   exit(0);/* Avoid obvious comments such as Exit 0 on success. */

The main point of this is to disallow cryptic undocumented exit statuses.
Recommending sysexits almost reverses this.  It gives cryptic undocumented
error statuses that are not even easy to decrypt for programs.  Programs
can look up sysexits, but without documentation there is no guarantee that
the encoding is according to sysexits.  Actually documenting use of
sysexits would make it even more painful to use.


[snip]

-   st.start = tv.tv_sec + tv.tv_usec * 1e-6;
+   if (clock_gettime(CLOCK_MONOTONIC_PRECISE, tv))
+   err(EX_OSERR, clock_gettime);

[snip]

+   st.start = tv.tv_sec + tv.tv_nsec * 1.0e-9;
}


The floating point addition starts losing precision after 8388608
seconds (slightly more than 97 days, a plausible uptime for a server).
It is better to subtract the timespecs to avoid this issue.


No, it is better to use floating point for results that only need to
be approximate.  Especially when the inputs are approximate and the
final approximation doesn't need to be very accurate.

Floating point is good for all timespec and timeval calculations,
except in the kernel where it is unavailable.  timespecs and timevals
are mostly used for timeouts, and the kernel isn't very careful about
exact timeouts.  Short timeouts have inherent large inaccuracy due
to interrupt granularity and latency.  Long timeouts can be relatively
more accurate, but only if the kernel is careful about them.  It is
only careful in some places.


With microseconds, the precision of a double is sufficient for 272
years, so that calculation is 

Re: svn commit: r265472 - head/bin/dd

2014-05-07 Thread Bruce Evans

On Wed, 7 May 2014, Alan Somers wrote:


On Wed, May 7, 2014 at 2:26 PM, Jilles Tjoelker jil...@stack.nl wrote:

The floating point addition starts losing precision after 8388608
seconds (slightly more than 97 days, a plausible uptime for a server).
It is better to subtract the timespecs to avoid this issue.

With microseconds, the precision of a double is sufficient for 272
years, so that calculation is probably acceptable.


Good point.  I'll fix it.  BTW, FreeBSD does not have a timespecsub,
but I notice that NetBSD does.  This seems like a good candidate to
import.

http://netbsd.gw.com/cgi-bin/man-cgi?timespecsub++NetBSD-current


Hrmph.  This was intentionally left out in FreeBSD.  Floating point
works better.  Unfortunately, floating point is unavailable in kernels,
or was too slow, especially in 1980, so APIs use the not so good
timespec and timeval APIs.  Not to mention bintimes, sbintimes and
struct tm.  sbintime or just nanoseconds would be better if floating
point is unavailable.  struct tm is too bloated to be good for syscalls
although not bloated enough to support sub-second times.

FreeBSD does have the corresponding design errors for timeval APIs:
timeradd(), etc.  These are even documented, but in a wrong way as
prototypes.  They are actually unsafe macros spelled as if they
are safe.  Some of them could be functions, but timercmp() must be
a macro.  Design errors in them start with their names (timer* instead
of tv*).

sys/time.h has rather complicated ifdefs to keep these APIs in or
out of the kernel.  It already has timespecsub(), but limits it to
the kernel.  The efficiency of these functions is unimportant, and
some like timevalsub() are extern functions in the kernel.  This
helps give the complicated ifdefs and confusing namespaces:
- bintime_sub() is kernel inline
- bintime_sub() is user inline (undocumented pollution)
- bintime_sub() uses an extra underscore
- the good name btsub() is not used
- timevalsub() is kernel extern
- timersub() is user macro (otherBSD compatibility cruft)
- the good name tvsub() is not used.  timersub() is an especially bad name
  for an API that handles timevals.  timevals were intentionally left out
  of old versions of POSIX.  POSIX invented timespecs instead.  The old
  APIs using timevals couldn't be killed so easily, and POSIX eventually
  had to standardize them.  POSIX also standardized many new timespec APIs.
  Then, just when the timevals could be killed better, the unnecessary
  APIs for manipulating them were introduced, though not in POSIX.  The
  non-kernel versions of them have a name that is particularly when there
  is more than 1 timer struct.
- timespecsub() is kernel macro
- timespecsub() as user API is intentionally left out
- the good name tssub() is not used.

Parameter names and orders for these APIs are also inconsistent and
mostly bad.  In the macro timersub(), the parameters are (tvp, uvp, vvp).
You might thing tvp is the target, but it is one of the sources.  In
the man page prototype, they are (a, b, res).  Slightly better, but
now missing a 'p'.  But 'p' is not missing in the man page for
timerset() or timerclear().  The ordering with the target last is an
old convention.  bintime functions mostly but not always go the other way.

Floating point is easier to use than these functions, except for
conversion from floating point back to a timespec or timeval.  That
takes about 2 statements instead of 1 function call.  It's just
a bit inelegant to go through floating point when you start with
timespecs and have to convert back to timespecs.

Bruce
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r265472 - head/bin/dd

2014-05-06 Thread Bruce Evans

On Tue, 6 May 2014, Alan Somers wrote:


Log:
 dd(1) uses gettimeofday(2) to compute the throughput statistics.  However,
 gettimeofday returns the system clock, which may jump forward or back,
 especially if NTP is in use.  If the time jumps backwards, then dd will see
 negative elapsed time, round it up to 1usec, and print an absurdly fast
 transfer rate.

 The solution is to use clock_gettime(2) with CLOCK_MONOTONIC_PRECISE as the
 clock_id.  That clock advances steadily, regardless of changes to the system
 clock.

 Reviewed by:   delphij
 MFC after: 3 days
 Sponsored by:  Spectra Logic
...
Modified: head/bin/dd/dd.c
==
--- head/bin/dd/dd.cTue May  6 22:04:50 2014(r265471)
+++ head/bin/dd/dd.cTue May  6 22:06:39 2014(r265472)
@@ -50,7 +50,6 @@ __FBSDID($FreeBSD$);
#include sys/conf.h
#include sys/disklabel.h
#include sys/filio.h
-#include sys/time.h

#include ctype.h
#include err.h
@@ -61,6 +60,8 @@ __FBSDID($FreeBSD$);
#include stdio.h
#include stdlib.h
#include string.h
+#include sysexits.h


Use of sysexits.h is a style bug.  It is not used in BSD or KNF code
like dd used to be.


+#include time.h
#include unistd.h

#include dd.h
@@ -123,7 +124,7 @@ static void
setup(void)
{
u_int cnt;
-   struct timeval tv;
+   struct timespec tv;

if (in.name == NULL) {
in.name = stdin;
@@ -240,8 +241,9 @@ setup(void)
ctab = casetab;
}

-   (void)gettimeofday(tv, NULL);


Not checking for errors, and especially voiding the result to say that
errors need not be checked for, was sloppy.


-   st.start = tv.tv_sec + tv.tv_usec * 1e-6;
+   if (clock_gettime(CLOCK_MONOTONIC_PRECISE, tv))
+   err(EX_OSERR, clock_gettime);


The existence of CLOCK_MONOTONIC_PRECISE is a design error.  It is the
same as the standard CLOCK_MONOTONIC.  Use of the former is just
gratuitously unportable.  It ensures a compile-time failure on OSes
(including old versions of FreeBSD) that don't have the
CLOCK_MONOTONIC_PRECISE mistake.  It gives a runtime error on old
versions of FreeBSD that have clock_gettime() and CLOCK_MONOTONIC but
not CLOCK_MONOTONIC_PRECISE.  Apart from this error, clock_gettime()
is as likely to fail as gettimeofday(), or exit().

The existence of CLOCK_MONOTONIC_FAST is a larger design error.  It
is not fast, but just sloppy, broken and not very fast (sloppiness is
the intentional part of its design).  Of course there is no need to
use it to speed up the whole 2 clock_gettime() calls in dd.  It is
almost useless for other applications too.  In non-broken implementations,
the speed of CLOCK_* tends to be dominated by the syscall time even
when the timer hardware is slow.  On freefall now, the implementation
is about half as good as possible, and takes between 32 and 35 nanoseconds
for all clock ids, with the least accurate clock id TIME_SECOND tieing
with CLOCK_REALTIME for slowest (only about 1 nanosecond slower) since
it has the worst implementation.
  (The implementation on freefall avoids syscalls provided the timer
  hardware is the TSC.  The hardware timer and binuptime() are used for
  all syscalls including TIME_SECOND.  TIME_SECOND is slowest because
  it does extra work to zero tv_sec after calculating it with maximal
  precision.  Doing it this way gives the correct time in seconds, but
  is incompatible with the kernel (syscall and kernel seconds variable)
  since the kernel calculations are sloppy.  The kernel time for seconds
  and for the sloppy FAST clock ids and for the corresponding kernel
  timestamp functions lags the precise time by up to 1/HZ seconds.  This
  difference can easily be seen from userland.

  The hardware part of the 32-35 nanoseconds is 2-4 nanoseconds.  This is
  in sloppy tests.  Reading the TSC is unserialized, and loops reading it
  may be pipelined too well to time it accurately.  Also, the times read
  from it are not completely accurate.  But clock_gettime() doesn't serialize
  it either.  Serializing it would make it much slower (maybe 50 nanoseconds).

  2-4 nanoseconds is almost as fast as on AthlonXP 10 years ago!  rdtsc
  tends to be slower than that on Intel CPUs and on modern CPUs.
  Synchronization in hardware makes it slower on modern CPUs.  ISTR it
  took about 65 cycles on a previous generation of Intel CPUs.

  32-35 nanonseconds for the libc implementation is actually less than
  half as good as it possible if the hardware part really is only 2-4
  nanoseconds.  There is a lot of overhead from layering and from working
  around hardware bugs that usually don't exist.
  )

Bad implementations can be quite bad.  With an i8254 timecounter, it takes
about 5000 nanoseconds to read the hardware.  On a 486, it took an additional
1-2 nanoseconds for overheads including a syscall.  On a Pentium1,
it only took 1000-2000 nanoseconds for overheads in old versions of FreeBSD