Hi, all--
On Sep 7, 2009, at 12:21 AM, Luigi Rizzo wrote:
3. ?CAUSE an error in tvtohz(), reported long ago in
? ? ? ?http://www.dragonflybsd.org/presentations/nanosleep/
? ? ? ?which causes a systematic error of an extra tick in the
computation
? ? ? ?of the sleep times.
? ?FIX: the above link also contains a proposed fix (which in fact
? ? ? ?reverts a bug introduced in some old commit on FreeBSD)
? ?PROBLEM: none that i see.
This change, as-is, is extremely dangerous. tsleep/msleep() use a
value of 0 meaning 'forever'. Changing tvtohz() so that it can now
return 0 for a non-zero tv is setting land-mines all over the place.
There's something like 27 callers of tvtohz() in sys/kern alone, some
of which are used to supply tsleep/msleep() timeouts. Note that the
dragonflybsd patch above only adds the 'returns 0' check to one
single
caller. You either need to patch all callers of tvtohz() since
you've
change the semantics, or add a 'if (ticks == 0) ticks = 1' check (or
checks) in the appropriate places inside tvtohz().
If you don't do it, then you end up with callers of random functions
with very small timeouts instead finding themselves sleeping forever.
You are right, a proper fix for this third issue should be different
(if we want a fix at all -- I'd be almost satisfied by just
removing the drift).
A while ago, I took a fairly close look at timing accuracy (ie, trying
to sleep until exactly the next second or whatever, and then comparing
the actual time you were woken up against the target time), did my own
fancy graphs showing sawtooth waves indicating timer aliasing
problems, etc. There's no question that the system is allowed to
sleep longer than requested, and, if the system is busy doing other
tasks, some amount of extra delay is expected and OK, but missing the
target time by more than several ticks, especially if the system is
idle, is bogus.
The Dragonfly patch to tvtohz() definitely improves timer accuracy,
but it ends up computing a tick value which is too small by one
sometimes (ie, the target time lands *within* that scheduler quantum,
rather than at or before the start of that tick), causing negative
offsets from the desired time (ie, you woke up just a bit too soon).
That's also bogus. :-)
From my own testing of various platforms, as well as the results
reported elsewhere, MacOS X has the most consistent timer accuracy,
which is probably not too surprising given that soft realtime
capabilities for audio/visual processing tasks such as iTunes,
Quicktime, LogicStudio, etc are pretty important to the Mac userbase.
I just compared the tvtohz() implementations in kern/kern_clock.c
between FreeBSD and Darwin/XNU (specifically xnu-1228.15.4, the kernel
for 10.5.8), and they are effectively identical. However, that's
probably moot because Darwin uses tvtoabstime() rather than tvtohz()
for the callouts in the setitimer()/getitimer() implementation from
kern/kern_time.c:
/* FreeBSD */
int
kern_setitimer(struct thread *td, u_int which, struct itimerval *aitv,
struct itimerval *oitv)
{
[ ... ]
if (which == ITIMER_REAL) {
PROC_LOCK(p);
if (timevalisset(&p->p_realtimer.it_value))
callout_stop(&p->p_itcallout);
getmicrouptime(&ctv);
if (timevalisset(&aitv->it_value)) {
callout_reset(&p->p_itcallout, tvtohz(&aitv-
>it_value),
realitexpire, p);
timevaladd(&aitv->it_value, &ctv);
}
*oitv = p->p_realtimer;
p->p_realtimer = *aitv;
PROC_UNLOCK(p);
if (timevalisset(&oitv->it_value)) {
if (timevalcmp(&oitv->it_value, &ctv, <))
timevalclear(&oitv->it_value);
else
timevalsub(&oitv->it_value, &ctv);
}
[ ... ]
/* Darwin */
int
setitimer(struct proc *p, struct setitimer_args *uap, register_t
*retval)
{
[ ... ]
switch (uap->which) {
case ITIMER_REAL:
proc_spinlock(p);
if (timerisset(&aitv.it_value)) {
microuptime(&p->p_rtime);
timevaladd(&p->p_rtime, &aitv.it_value);
p->p_realtimer = aitv;
if (!thread_call_enter_delayed(p->p_rcall,
tvtoabstime(&p->p_rtime))) // [1]
p->p_ractive++;
} else {
timerclear(&p->p_rtime);
p->p_realtimer = aitv;
if (thread_call_cancel(p->p_rcall))
p->p_ractive--;
}
proc_spinunlock(p);
break;
[ ... ]
The simplest option is perhaps to compute a custom value for
nanosleep, select and poll. This would remove the risk of side
effects to other parts of the system, and we could also use the
chance to compensate for the errors that arise when hz*tick !=
1000000 or when we know that hardclock does not run exactly every
'tick' (an integer) microseconds.
I think it would be relatively simple to adapt the DragonFly change
but ensure that tztohz() returns at least 1, and that would be helpful
improvement.
However, in the long run, a callout mechanism based upon scheduler
ticks rather than actual time values suffers an inherent loss of
accuracy (you're performing an integer division of time intervals by
HZ and rounding up to use the API, after all) and makes assumptions
that the scheduler's clock is always running at a constant frequency
such that tick * HZ is exactly a million microseconds.
Regards,
--
-Chuck
[1]: See
http://www.opensource.apple.com/source/xnu/xnu-1228.15.4/osfmk/kern/thread_call.c
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"