On Tue, Apr 08, 2025 at 01:13:25PM -0400, Greg Troxel wrote: > Christof Meerwald <cme...@cmeerw.org> writes: > > > VirtFusion is based on Linux KVM. > > That I have heard hof. > > > I am not sure about a missed interrupt (or even anything kernel > > related) - wouldn't that also affect pings and TCP connections? > > If the disk interrrupts were missed, and network interrupts were ok, it > could be consistent. I'm just guessing. > > > To me it would feel more like maybe syslogd blocking on something > > (which might then also block anything trying to log something via > > syslog)? > > I wouldn't expect syslog clients to block. > > You also haven't described the machine. If you're doing anything other > than setting YES in rc.conf to start things at boot, please explain.
I think initially I really only had sshd=YES there, but I have recently added lighttpd (but that was after the issue in April). > Also, you could explain the history and frequency. Did this arise > recently? Just had the issue show up again on that machine (after 3 months) - I have that VPS since around September last year, and it's been occuring maybe every 1-3 months. The lighttpd error log shows: 2025-07-07 02:54:28: (server.c.469) warning: clock jumped 48611 secs 2025-07-07 02:54:28: (server.c.482) clock jumped; attempting graceful restart in < ~5 seconds, else hard restart 2025-07-07 16:24:39: (server.c.1234) [note] graceful shutdown started So maybe it's related to the timer. BTW, "sysctl kern.timecounter" shows: kern.timecounter.choice = TSC(q=-100, f=4491577000 Hz) clockinterrupt(q=0, f=100 Hz) lapic(q=-100, f=1000004000 Hz) piixpm0(q=1000, f=3579545 Hz) hpet0(q=2000, f=100000000 Hz) ACPI-Fast(q=1000, f=3579545 Hz) i8254(q=100, f=1193182 Hz) dummy(q=-1000000, f=1000000 Hz) kern.timecounter.hardware = hpet0 kern.timecounter.timestepwarnings = 0 My guess would be that it could be related to hpet0 here? I have another NetBSD VPS (running NetBSD current) that doesn't show this issue and there I have: kern.timecounter.choice = TSC(q=-100, f=2218696360 Hz) clockinterrupt(q=0, f=100 Hz) lapic(q=-100, f=1008509000 Hz) piixpm0(q=1000, f=3579545 Hz) ACPI-Fast(q=1000, f=3579545 Hz) i8254(q=100, f=1193182 Hz) dummy(q=-1000000, f=1000000 Hz) kern.timecounter.hardware = ACPI-Fast kern.timecounter.timestepwarnings = 0 > > In any case, any idea how to debug this? For now I have attached > > ktruss to syslogd, but I am not sure how often I might see this issue. > > I would write a script to run ps alxw once a minute and save it to files > named by date +%s, and see if that results in interesting values in > wchan on the next incident. Unfortunately, I didn't have this in place any more, but it looks like the issue is caused by timer being stuck. Not sure if the issue would be on the NetBSD side or the qemu/kvm side. Christof -- https://cmeerw.org sip:cmeerw at cmeerw.org mailto:cmeerw at cmeerw.org xmpp:cmeerw at cmeerw.org