> Date: Sun, 30 Jul 2023 14:56:53 -0400
> From: Brad Spencer <b...@anduin.eldar.org>
> 
> Taylor R Campbell <riastr...@netbsd.org> writes:
> 
> > Can you please try running with the attached patch and share the
> > warnings it produces?  Should give slightly more information.
> 
> Caught another one.  As far as I know the system is up to date with all
> of the requested patches:
> 
> [ 19419.647972] WARNING: lwp 16 (system idle/1) flags 0xa0000020: timecounter 
> went backwards from (19420 + 0x9e37cf0149d8f7bb/2^64) sec at 
> netbsd:mi_switch+0x11e on cpu1 to (19419 + 0xad917b77bd0a7cd3/2^64) sec at 
> netbsd:mi_switch+0x11e on cpu1

Can you run this dtrace script for a while (say, for a day, or from
start of boot until you see the WARNING above which only happens once
per boot), and then hit ^C?

dtrace -x nolibs -n 'sdt:xen:hardclock:jump { @ = quantize(arg1 - arg0) } 
sdt:xen:hardclock:jump /arg2 >= 430/ { printf("hardclock jump violated 
timecounter contract") }'

If my hypothesis is correct, you can just leave this running over any
particular workload and you'll get:

(a) a message printed whenever the hardclock delay is too long, and
(b) when you hit ^C at the end, a histogram of all the >1-tick
    hardclock jump delays.

(Avoiding the tick-10s probe, like I used in the last dtrace
suggestion, means you won't get updates printed every 10sec to your
terminal -- you'll have to hit ^C to see the results -- but as an
upside it won't instantly crash your kernel owing to the Xen/!Xen
module ABI mismatch for CLKF_USERMODE/PC.)

Reply via email to