Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Robert Elz
Date:Sun, 30 Jul 2017 16:04:38 - (UTC)
From:mlel...@serpens.de (Michael van Elst)
Message-ID:  

  | There are slower emulated systems that don't have these issues. (*)

Yes, that it is not qemu's execution speed was (really, always) becoming
obvious.

  | If the host misses interrupts, time in the guest just passes slower
  | than real-time. But inside the guest it is consistent.

If we could achieve that (which changing the timecounter in qemu
apparently achieves) it would at least make the world become rational.
Of course, keeping the timing running faster would be better - if we were
able to get to a state where the client/guest were actually able to talk
to the outside world (that part is easy) and run NTP, and act as a time
server that others could trust, that would be ideal.

  | This is not to be confused with the kernel idea of wall-clock time
  | (i.e. what date reports). wall-clock time is usually maintained
  | by hardware seperated from the interrupt timers. The 'date; sleep 5; date'
  | sequence therefore can show that 10 seconds passed.

But that is totally broken.   While there is no guarantee that a sleep will
wake up after exactly the time requested, it should be as close as is
reasonably possible - and on an unloaded system, where there is sufficient
RAM, and nothing swapped out, and nothing computing for cpu cycles, that
sequence should (always) show something between 5 and 5 and a bit seconds
have passed.   If the cpu is busy, or things are getting swapped/paged out,
then we can expect slower (not only for processes waiting upon timer signals,
but for everything), and that's acceptable.

But otherwise, inconsistent timing is not acceptable.   All kinds of
applications (including network protocols) require time to be kept in a
way that is at least close to what others observe, even if not identical.

One easy (poor) fix is simply to do as used to be done, and have kernel
wall clock time maintained by the tick interrupt - that makes things
consistent, but without any real expectation of accuracy.  The alternative
is to make the tic counts depend upon the external wall clock time source,
so they keep in sync - much the same as the power companies do with frequency,
over any short period, the nominal 50/60 Hz frequency can drift around a lot,
but when measured over any reasonable period, those things are highly accurate
(which is why old AC frequency based tick systems used to have very good
long term time stability, provided they never lost clock interrupts.)

  | The problem with qemu is that it's running on a NetBSD host and
  | therefore cannot issue interrupts based on host time unless the
  | host has a larger HZ value.

In the system of most interest, the host, and the guest, are the exact
same system (the exact same binary kernel) - unless we alter the config
of one of them explicitly to avoid this issue, they cannot help but have
the same HZ value.

As long as the emulated qemu client has access to a reasonably accurate ToD
value (which it obviously does, as the host's time is available to qemu, and
can be, and is it seems, made available to the guest) there's no reason at
all the guest cannot produce the correct number of ticks.

And doing so (since it is just a generic NetBSD) would solve the similar,
but less blatant issue for any other system using ticks, where the occasional
clock interrupt might get lost, and where there is some other ToD source
available.

  | With host and guest running at HZ=100, it's obvious that interrupts
  | mostly come just too late and require two ticks on the host, thus
  | slowing down guest time by a factor of two.

Yes, that is a very good explanation for the observed behaviour, and I
cannot help but be grateful that simply beginning to discuss this issue
has provided so many insights into what is happening, and what we can do
to fix things.

When there is no alternative than tick interrupts, we can, and do, use
those to measure time, and everything works - just if the ticks are not
received at the expected rate time keeping drifts away from real time
(but invisibly when considered only within the system.)

When there is some better measure of real time we can use, we can make sure
that keeps all time keeping synchronised better, regardless of whether the
system is "tickless" or still tick based - it isn't required that every
single tick be 1/HZ apart (they never are precisely anyway) just that over
the long term (which in computing is a half second or so) the correct number
of ticks have occurred.

I think it should be possible to make that happen, and that is what I am
going to see if I can do.   Then we can see if we can find a (good enough)
way to make nanosleep() less ticky - whether by giving up on ticks
altogether (which is probably not the best solution - even if we don't
use ticks for timing, we'd end up emulating them for other things, if only
to avoid needing to rewrite too 

Re: kmem_alloc(0, f)

2017-07-30 Thread Martin Husemann
On Sun, Jul 30, 2017 at 03:23:50PM -, Michael van Elst wrote:
> So what does kmem_alloc(0, KM_SLEEP) do? fail where KM_SLEEP says it
> cannot fail? I don't think that it can return a zero sized allocation
> (i.e. ptr != NULL that cannot be dereferenced).

Sure it could, return a pointer inside some red zone unmapped (but reserved
kva) page. On typical setups and modulo syscctl vm.user_va0_disable
e.g. "return (void*)16;" just as a simple example.

Martin


Re: kmem API to allocate arrays

2017-07-30 Thread Martin Husemann
On Sun, Jul 30, 2017 at 03:30:59PM -, Michael van Elst wrote:
> Reallocation is usually a reason for memory fragmentation. I would
> rather try to avoid it instead of making it easier.

Agreeed. Also for kernel drivers, resizing an array allocation is
a very rare operation and no good reason to overcomplicate the api.

Martin


Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Michael van Elst
g...@gson.org (Andreas Gustafsson) writes:

>Frank Kardel wrote:
>> Fixing that requires some more work. But I am surprised that the qemu 
>> interrupt rate is seemingly somewhat around 50Hz.

It shouldn't have a problem on Linux.
-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: kmem API to allocate arrays

2017-07-30 Thread Kamil Rytarowski
On 30.07.2017 16:51, Taylor R Campbell wrote:
>> Date: Sun, 30 Jul 2017 16:24:07 +0200
>> From: Kamil Rytarowski 
>>
>> I would allow size to be 0, like with the original reallocarr(3). It
>> might be less pretty, but more compatible with the original model and
>> less vulnerable to accidental panics for no good reason.
> 
> Hard to imagine a legitimate use case for size = 0.  Almost always,
> the parameter will be sizeof(struct foo), or some kind of blocksize
> which necessarily has to be nonzero.
> 
> I started writing some example code, and I'm not too keen on having to
> write kmem_reallocarr for initial allocation and final freeing, so if
> we adopted this, I'd like to have
> 
> int   kmem_allocarr(void *ptrp, size_t size, size_t count, km_flag_t flags);
> int   kmem_reallocarr(void *ptrp, size_t size, size_t ocnt, size_t ncnt,
>   km_flag_t flags);
> void  kmem_freearr(void *ptrp, size_t size, size_t count);
> 
> ...at which point it's actually not clear to me that we have much of a
> use for kmem_reallocarr.  Maybe we do -- I haven't surveyed many
> users.
> 
> This still doesn't address the question of whether or how we should
> express bounds on the allowed sizes of the arrays.
> 

I see, perhaps it's legitimate to avoid realloc due to fragmentation.
Without this reallocarr has little point.



signature.asc
Description: OpenPGP digital signature


Re: kmem API to allocate arrays

2017-07-30 Thread Michael van Elst
campbell+netbsd-tech-k...@mumble.net (Taylor R Campbell) writes:

>Initially I was reluctant to do that because (a) we don't even have a
>kmem_realloc, perhaps for some particular reason, and (b) it requires
>an extra parameter for the old size.  But I don't know any particular
>reason in (a), and perhaps (b) not so bad after all.  Here's a draft:

Reallocation is usually a reason for memory fragmentation. I would
rather try to avoid it instead of making it easier.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: kmem_alloc(0, f)

2017-07-30 Thread Michael van Elst
mar...@duskware.de (Martin Husemann) writes:

>On Sat, Jul 29, 2017 at 02:04:42PM +, Taylor R Campbell wrote:
>> This seems like a foot-oriented panic gun, and it's been a source of
>> problems in the past.  Can we change it?

>I think it is a valuable tool to catch driver bugs early during
>developement, but wouldn't mind to reduce it to a KASSERT.

So what does kmem_alloc(0, KM_SLEEP) do? fail where KM_SLEEP says it
cannot fail? I don't think that it can return a zero sized allocation
(i.e. ptr != NULL that cannot be dereferenced).

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Mouse
>> # time sleep 10
>>10.02 real 0.00 user 0.00 sys

>> This actually took 20 seconds of real time (manually timed with a
>> stopwatch).

> [...], but an error of a factor 2 looks suspicious.

This is tickling old memories.  I think I ran into a case where
requesting timer ticks at 100Hz actually got them at 50Hz instead, even
though the kernel was running with 100Hz ticks.  I've done some
searching and completely failed to find either the program exhibiting
the symptom (I _think_ it was userland) or the fix, but it might be
worth looking into the possibility that this is another manifestation
of the same underlying problem, whatever it was.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: kmem API to allocate arrays

2017-07-30 Thread Taylor R Campbell
> Date: Sun, 30 Jul 2017 16:24:07 +0200
> From: Kamil Rytarowski 
> 
> I would allow size to be 0, like with the original reallocarr(3). It
> might be less pretty, but more compatible with the original model and
> less vulnerable to accidental panics for no good reason.

Hard to imagine a legitimate use case for size = 0.  Almost always,
the parameter will be sizeof(struct foo), or some kind of blocksize
which necessarily has to be nonzero.

I started writing some example code, and I'm not too keen on having to
write kmem_reallocarr for initial allocation and final freeing, so if
we adopted this, I'd like to have

int kmem_allocarr(void *ptrp, size_t size, size_t count, km_flag_t flags);
int kmem_reallocarr(void *ptrp, size_t size, size_t ocnt, size_t ncnt,
km_flag_t flags);
voidkmem_freearr(void *ptrp, size_t size, size_t count);

...at which point it's actually not clear to me that we have much of a
use for kmem_reallocarr.  Maybe we do -- I haven't surveyed many
users.

This still doesn't address the question of whether or how we should
express bounds on the allowed sizes of the arrays.


Re: kmem API to allocate arrays

2017-07-30 Thread Kamil Rytarowski
On 30.07.2017 15:45, Taylor R Campbell wrote:
>> Date: Sun, 30 Jul 2017 10:22:11 +0200
>> From: Kamil Rytarowski 
>>
>> I think we should go for kmem_reallocarr(). It has been designed for
>> overflows like realocarray(3) with an option to be capable to resize a
>> table fron 1 to N elements and back from N to 0 including freeing.
> 
> Initially I was reluctant to do that because (a) we don't even have a
> kmem_realloc, perhaps for some particular reason, and (b) it requires
> an extra parameter for the old size.  But I don't know any particular
> reason in (a), and perhaps (b) not so bad after all.  Here's a draft:
> 
> int
> kmem_reallocarr(void *ptrp, size_t size, size_t ocnt, size_t ncnt, int flags)
> {
>   void *optr, *nptr;
> 
>   KASSERT(size != 0);
>   if (__predict_false((size|ncnt) >= SQRT_SIZE_MAX &&
>   ncnt > SIZE_MAX/size))
>   return ENOMEM;
> 
>   memcpy(, ptrp, sizeof(void *));
>   KASSERT((ocnt == 0) == (optr == NULL));
>   if (ncnt == 0) {
>   nptr = NULL;
>   } else {
>   nptr = kmem_alloc(size*ncnt, flags);
>   KASSERT(nptr != NULL || flags == KM_NOSLEEP);
>   if (nptr == NULL)
>   return ENOMEM;
>   }
>   KASSERT((ncnt == 0) == (nptr == NULL));
>   if (ocnt & ncnt)
>   memcpy(nptr, optr, size*MIN(ocnt, ncnt));
>   if (ocnt != 0)
>   kmem_free(optr, size*ocnt);
>   memcpy(ptrp, , sizeof(void *));
> 
>   return 0;
> }
> 

I would allow size to be 0, like with the original reallocarr(3). It
might be less pretty, but more compatible with the original model and
less vulnerable to accidental panics for no good reason.



signature.asc
Description: OpenPGP digital signature


Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Andreas Gustafsson
Frank Kardel wrote:
> Fixing that requires some more work. But I am surprised that the qemu 
> interrupt rate is seemingly somewhat around 50Hz.
> Could it be a bug in qemu getting the frequeny not right. qemu should 
> read the clock to get the frequencies right an possibly skip
> usleeps less than i/HZ possibly managing an error-budget. I haven't 
> looked into qemu at all, but an error of a factor 2 looks suspicious.

I fully agree.
-- 
Andreas Gustafsson, g...@gson.org


Re: kmem API to allocate arrays

2017-07-30 Thread Taylor R Campbell
> Date: Sun, 30 Jul 2017 10:22:11 +0200
> From: Kamil Rytarowski 
> 
> I think we should go for kmem_reallocarr(). It has been designed for
> overflows like realocarray(3) with an option to be capable to resize a
> table fron 1 to N elements and back from N to 0 including freeing.

Initially I was reluctant to do that because (a) we don't even have a
kmem_realloc, perhaps for some particular reason, and (b) it requires
an extra parameter for the old size.  But I don't know any particular
reason in (a), and perhaps (b) not so bad after all.  Here's a draft:

int
kmem_reallocarr(void *ptrp, size_t size, size_t ocnt, size_t ncnt, int flags)
{
void *optr, *nptr;

KASSERT(size != 0);
if (__predict_false((size|ncnt) >= SQRT_SIZE_MAX &&
ncnt > SIZE_MAX/size))
return ENOMEM;

memcpy(, ptrp, sizeof(void *));
KASSERT((ocnt == 0) == (optr == NULL));
if (ncnt == 0) {
nptr = NULL;
} else {
nptr = kmem_alloc(size*ncnt, flags);
KASSERT(nptr != NULL || flags == KM_NOSLEEP);
if (nptr == NULL)
return ENOMEM;
}
KASSERT((ncnt == 0) == (nptr == NULL));
if (ocnt & ncnt)
memcpy(nptr, optr, size*MIN(ocnt, ncnt));
if (ocnt != 0)
kmem_free(optr, size*ocnt);
memcpy(ptrp, , sizeof(void *));

return 0;
}


Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Frank Kardel

Hi Andreas !

On 07/30/17 15:20, Andreas Gustafsson wrote:
> Frank Kardel wrote:
>> Could you check which timecounter is used under qemu?
>>
>> sysctl kern.timecounter.hardware
>
> # sysctl kern.timecounter.hardware
> kern.timecounter.hardware = hpet0
>
>> Usually the timecounters are hardware-based and have no relation
>> to the clockinterrupt. In case of qemu you might get a good
>> emulated timecounter, but a suboptimal clockinterupt.
>> If this is the case it helps to use the clockinterrupt
>> itself as timecounter for the wall clock time to avoid a discrepancy
>> between clockinterrupt-driven timeout handling and wall-clock time 
tracking.

>>
>> sysctl -w kern.timecounter.hardware=clockinterrupt
>
> # sysctl -w kern.timecounter.hardware=clockinterrupt
> kern.timecounter.hardware: hpet0 -> clockinterrupt
> # time sleep 10
>10.02 real 0.00 user 0.00 sys
>
> This actually took 20 seconds of real time (manually timed with a
> stopwatch).
>
>> This is the opposite from deducing the missed clock interrupts
>> from the wall clock time and keeps timeout handling and in the 
emulation

>> observed wall-time synchronized no matter how slow
>> the clock-interrupts are - the emulated wall clock time will be
>> at the same rate.
>
> Right, but I would still rather see the bug fixed than worked around
> this way.
Fixing that requires some more work. But I am surprised that the qemu 
interrupt rate is seemingly somewhat around 50Hz.
Could it be a bug in qemu getting the frequeny not right. qemu should 
read the clock to get the frequencies right an possibly skip
usleeps less than i/HZ possibly managing an error-budget. I haven't 
looked into qemu at all, but an error of a factor 2 looks suspicious.


Frank



Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Frank Kardel

Could you check which timecounter is used under qemu?

sysctl kern.timecounter.hardware

Usually the timecounters are hardware-based and have no relation
to the clockinterrupt. In case of qemu you might get a good
emulated timecounter, but a suboptimal clockinterupt.
If this is the case it helps to use the clockinterrupt
itself as timecounter for the wall clock time to avoid a discrepancy 
between clockinterrupt-driven timeout handling and wall-clock time tracking.


sysctl -w kern.timecounter.hardware=clockinterrupt

This is the opposite from deducing the missed clock interrupts
from the wall clock time and keeps timeout handling and in the emulation 
observed wall-time synchronized no matter how slow

the clock-interrupts are - the emulated wall clock time will be
at the same rate.

This might be a workaround for the current qemu issue and does not
affect any discussion about improving sleep timing or
migrating to a tick-less kernel.

BTW: even a tick-less kernel will need to e a minimum interrupt
frequency in order to avoid undetected timecounter wrapping.

Frank



On 07/30/17 14:22, Robert Elz wrote:

 Date:Sun, 30 Jul 2017 13:01:50 +0300
 From:Andreas Gustafsson 
 Message-ID:  <22909.44686.188004.117...@guava.gson.org>

   | I don't think the slowness of qemu's emulation is the actual cause of
   | its inability to simulate clock interrupts at 100 Hz.

Yes, I was wondering about that, as if it was, there'd often be no time
left for anything else...

   | If my theory is correct, there are at least three ways the problem
   | could be fixed:
   |
   |  - Improve the time resolution of sleeps on the host system,
   |  - Make qemu deal better with hosts unable to sleep for short periods

Either, or both, of those should be fixed, and I might get to take a
look at the first one (the insides of qemu are not all that appealing...)
but

   |  - Make the guest system deal better with missed timer interrupts.

This one needs to be fixed. an idle system that says it takes 13 seconds
to do a sleep 10 is simply broken.  Fixing the other issues (or either
one of them) would make it much harder to work on this one - that is
keeping the qemu/host relationship stable allows a platform where the
timekeeping issues in the kernel are known to occur, so a good way to
verify any fix, so I think this should be fixed first.

kre



Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Robert Elz
Date:Sun, 30 Jul 2017 13:01:50 +0300
From:Andreas Gustafsson 
Message-ID:  <22909.44686.188004.117...@guava.gson.org>

  | I don't think the slowness of qemu's emulation is the actual cause of
  | its inability to simulate clock interrupts at 100 Hz.

Yes, I was wondering about that, as if it was, there'd often be no time
left for anything else...

  | If my theory is correct, there are at least three ways the problem
  | could be fixed:
  | 
  |  - Improve the time resolution of sleeps on the host system,
  |  - Make qemu deal better with hosts unable to sleep for short periods

Either, or both, of those should be fixed, and I might get to take a
look at the first one (the insides of qemu are not all that appealing...)
but

  |  - Make the guest system deal better with missed timer interrupts.

This one needs to be fixed. an idle system that says it takes 13 seconds
to do a sleep 10 is simply broken.  Fixing the other issues (or either
one of them) would make it much harder to work on this one - that is
keeping the qemu/host relationship stable allows a platform where the
timekeeping issues in the kernel are known to occur, so a good way to
verify any fix, so I think this should be fixed first.

kre



Re: kmem_alloc(0, f)

2017-07-30 Thread Martin Husemann
On Sat, Jul 29, 2017 at 02:04:42PM +, Taylor R Campbell wrote:
> This seems like a foot-oriented panic gun, and it's been a source of
> problems in the past.  Can we change it?

I think it is a valuable tool to catch driver bugs early during
developement, but wouldn't mind to reduce it to a KASSERT.

Martin


Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Andreas Gustafsson
Robert Elz wrote:
> I want to leave /bin/sh to percolate for a while, make sure there are
> no issues with it as it is, before starting on the next round of
> cleanups and bug fixes, so I was looking for something else to poke
> my nose into ...
> 
> [Aside: the people I added to the cc of this message are those who have
>  added text to PR kern/43997 and so I thought might be interested, if you're
>  not, just say...]
> 
> kern/43997 is the "qemu is too slow, clock interrupts get lost, timing
> gets all messed up" problem that plagues many of the ATF tests that kind
> of expect time to be maintained rationally.

Thank you for looking into this.

> Now there's no question that qemu is slow, for example, on my amd64 Xen
> DomU test system, the shell arithmetic test of ++x (etc) takes:
>   var_preinc: [0.077617s] Passed.
> whereas from the latest completed b5 (qemu) test run (as of this e-mail)
>   var_preinc   Passed   N/A   6.200489s
> 
> That's about 80 times slower (and most of the other tests show similar
> factors).   I don't think we can blame qemu for that, given what it is
> doing.
> 
> So, it is hardly surprising that, to borrow Paul's words from the PR:
>   On (at least) amd64 architecture, qemu cannot simulate clock
>   interrupts at 100Hz.

I don't think the slowness of qemu's emulation is the actual cause of
its inability to simulate clock interrupts at 100 Hz.  Rather, I think
it is more likely caused by the inability of qemu to sleep for periods
shorter than 10 ms due to limitations of the underlying host OS, such
as that documented in the BUGS section of nanosleep(2).

That this is at least partly a host system issue is supported by the
observation that when qemu is hosted on a Linux system, the timing in
the NetBSD guest is much more accurate than when qemu is hosted on
NetBSD, on similar hardware:

  NetBSD-on-qemu-on-NetBSD# time sleep 10
 13.00 real 0.00 user 0.03 sys

  NetBSD-on-qemu-on-Linux# time sleep 10
 10.13 real 0.02 user 0.02 sys

If my theory is correct, there are at least three ways the problem
could be fixed:

 - Improve the time resolution of sleeps on the host system, as
   recently discussed on tech-kern in a thread starting with
   http://mail-index.netbsd.org/tech-kern/2017/07/02/msg022024.html

 - Make qemu deal better with hosts unable to sleep for short
   periods of time, or

 - Make the guest system deal better with missed timer interrupts.

-- 
Andreas Gustafsson, g...@gson.org


Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Robert Elz
I want to leave /bin/sh to percolate for a while, make sure there are
no issues with it as it is, before starting on the next round of
cleanups and bug fixes, so I was looking for something else to poke
my nose into ...

[Aside: the people I added to the cc of this message are those who have
 added text to PR kern/43997 and so I thought might be interested, if you're
 not, just say...]

kern/43997 is the "qemu is too slow, clock interrupts get lost, timing
gets all messed up" problem that plagues many of the ATF tests that kind
of expect time to be maintained rationally.

Now there's no question that qemu is slow, for example, on my amd64 Xen
DomU test system, the shell arithmetic test of ++x (etc) takes:
var_preinc: [0.077617s] Passed.
whereas from the latest completed b5 (qemu) test run (as of this e-mail)
var_preinc   Passed   N/A   6.200489s

That's about 80 times slower (and most of the other tests show similar
factors).   I don't think we can blame qemu for that, given what it is
doing.

So, it is hardly surprising that, to borrow Paul's words from the PR:
On (at least) amd64 architecture, qemu cannot simulate clock
interrupts at 100Hz.
nor that
Therefore, a simple "date ; sleep 5; date" command
actually requires 10 seconds to complete!

This (aside from the workload it creates on b5) shouldn't even really be
an issue, I don't think we have any ATF NTP tests, and if we did, attempting
those in a qemu emulated environment would be insane.

The problem is really (again from the PR)

The routines sleep(3), usleep(3), and nanosleep(2) wake-up based on the 
occurrence of clock ticks.  However, the timer interrupt routine
determines the actual absolute time.

which means that the NetBSD kernel is getting itself out of sync - it is
not maintaining one consistent view of the time for the system it is running.

Whether its time view internally matches the outside reality is not really
a big issue - obviously it is better if it does, at least as close as possible
(without external time sync mechanisms, nothing is perfect) but internally
it really should be consistent.

What's more, at least from the description of the problem, I see nothing that
would prevent the same issue arising (probably on a much smaller scale) on
any system that happened to suffer an interrupt storm (due to either something
broken, some kind of attack, or just a very heavy workload) that happens to
last more than 10ms (on a 100Hz based tick system, 1ms on an alpha with 1024Hz)
and causes a clock tick to be lost.

So, I think qemu is no more than a good environment for simulating the
underlying problem, and not itself in any material way related to the
problem, which is squarely a NetBSD kernel issue.

If there's no disagreement about this analysis, I plan on digging into the
clock/time handling parts of the kernel, and fixing this (whatever it takes...)

My current guess of the "whatever it takes" is that something along the lines
of
we know absolute time (the timer interrupt routine uses it)
we know when the last clock tick happened (we made it happen, we
can remember when that was)
we can calculate how many clock ticks should have been generated in
the intervening period
tick tick tick...

is needed.   But I am yet to delve into the code (this is mostly just from the
PR.)   Note: this can be optimised so that there's very little (though probably
not zero) extra work in the common case where nothing is being missed.

kre