On Tue, Apr 08, 2025 at 01:13:25PM -0400, Greg Troxel wrote:
> Christof Meerwald <cme...@cmeerw.org> writes:
> 
> > VirtFusion is based on Linux KVM.
> 
> That I have heard hof.
> 
> > I am not sure about a missed interrupt (or even anything kernel
> > related) - wouldn't that also affect pings and TCP connections?
> 
> If the disk interrrupts were missed, and network interrupts were ok, it
> could be consistent.  I'm just guessing.
> 
> > To me it would feel more like maybe syslogd blocking on something
> > (which might then also block anything trying to log something via
> > syslog)?
> 
> I wouldn't expect syslog clients to block.
> 
> You also haven't described the machine.  If you're doing anything other
> than setting YES in rc.conf to start things at boot, please explain.

I think initially I really only had sshd=YES there, but I have
recently added lighttpd (but that was after the issue in April).

> Also, you could explain the history and frequency.  Did this arise
> recently?

Just had the issue show up again on that machine (after 3 months) - I
have that VPS since around September last year, and it's been occuring
maybe every 1-3 months.

The lighttpd error log shows:

2025-07-07 02:54:28: (server.c.469) warning: clock jumped 48611 secs
2025-07-07 02:54:28: (server.c.482) clock jumped; attempting graceful
restart in < ~5 seconds, else hard restart
2025-07-07 16:24:39: (server.c.1234) [note] graceful shutdown started

So maybe it's related to the timer. BTW, "sysctl kern.timecounter"
shows:

kern.timecounter.choice = TSC(q=-100, f=4491577000 Hz) clockinterrupt(q=0, 
f=100 Hz) lapic(q=-100, f=1000004000 Hz) piixpm0(q=1000, f=3579545 Hz) 
hpet0(q=2000, f=100000000 Hz) ACPI-Fast(q=1000, f=3579545 Hz) i8254(q=100, 
f=1193182 Hz) dummy(q=-1000000, f=1000000 Hz)
kern.timecounter.hardware = hpet0
kern.timecounter.timestepwarnings = 0

My guess would be that it could be related to hpet0 here?

I have another NetBSD VPS (running NetBSD current) that doesn't show
this issue and there I have:

kern.timecounter.choice = TSC(q=-100, f=2218696360 Hz) clockinterrupt(q=0, 
f=100 Hz) lapic(q=-100, f=1008509000 Hz) piixpm0(q=1000, f=3579545 Hz) 
ACPI-Fast(q=1000, f=3579545 Hz) i8254(q=100, f=1193182 Hz) dummy(q=-1000000, 
f=1000000 Hz)
kern.timecounter.hardware = ACPI-Fast
kern.timecounter.timestepwarnings = 0


> > In any case, any idea how to debug this? For now I have attached
> > ktruss to syslogd, but I am not sure how often I might see this issue.
> 
> I would write a script to run ps alxw once a minute and save it to files
> named by date +%s, and see if that results in interesting values in
> wchan on the next incident.

Unfortunately, I didn't have this in place any more, but it looks like
the issue is caused by timer being stuck. Not sure if the issue would
be on the NetBSD side or the qemu/kvm side.


Christof

-- 
https://cmeerw.org                             sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

Reply via email to