> Date: Sun, 30 Jul 2023 14:56:53 -0400 > From: Brad Spencer <b...@anduin.eldar.org> > > Taylor R Campbell <riastr...@netbsd.org> writes: > > > Can you please try running with the attached patch and share the > > warnings it produces? Should give slightly more information. > > Caught another one. As far as I know the system is up to date with all > of the requested patches: > > [ 19419.647972] WARNING: lwp 16 (system idle/1) flags 0xa0000020: timecounter > went backwards from (19420 + 0x9e37cf0149d8f7bb/2^64) sec at > netbsd:mi_switch+0x11e on cpu1 to (19419 + 0xad917b77bd0a7cd3/2^64) sec at > netbsd:mi_switch+0x11e on cpu1
Can you run this dtrace script for a while (say, for a day, or from start of boot until you see the WARNING above which only happens once per boot), and then hit ^C? dtrace -x nolibs -n 'sdt:xen:hardclock:jump { @ = quantize(arg1 - arg0) } sdt:xen:hardclock:jump /arg2 >= 430/ { printf("hardclock jump violated timecounter contract") }' If my hypothesis is correct, you can just leave this running over any particular workload and you'll get: (a) a message printed whenever the hardclock delay is too long, and (b) when you hit ^C at the end, a histogram of all the >1-tick hardclock jump delays. (Avoiding the tick-10s probe, like I used in the last dtrace suggestion, means you won't get updates printed every 10sec to your terminal -- you'll have to hit ^C to see the results -- but as an upside it won't instantly crash your kernel owing to the Xen/!Xen module ABI mismatch for CLKF_USERMODE/PC.)