Hello all,
Am 31.08.2022 um 11:29 schrieb Matthias Petermann:
Hello all,
I have a NetBSD 9.3 host that hosts multiple virtual machines (Qemu with
nvmm acceleration). One peculiarity: I use estd on the host to control
down the power consumption of the whole system via frequency scaling
when there is no load.
The time of the host is synchronized via ntpd (with default settings),
as well as the guests.
The host's time is correct.
The guests' time increasingly lags behind with continued operation. Also
the ntpd seems to have no compensating effect in the guests here.
What could be the reason for this? Can estd be a source of interference?
I start the Qemu instances like this:
```
nohup qemu-system-x86_64 -machine pc-q35-7.0 -smp $VM_CORES -m
$VM_RAM -accel nvmm \
-device virtio-balloon-pci,id=balloon0 \
-k de -boot cd -cdrom $VM_CDROM \
-machine graphics=off -display none -vga none \
-object
rng-random,filename=/dev/urandom,id=viornd0 \
-device virtio-rng-pci,rng=viornd0 \
-object iothread,id=t0 \
$BLK \
-device virtio-net-pci,netdev=vioif0,mac=$VM_MAC \
-netdev
tap,id=vioif0,ifname=$VM_NETIF,script=no,downscript=no \
-chardev
socket,id=monitor,path=$MONITOR_SOCKET,server=on,wait=off \
-monitor chardev:monitor \
-chardev
socket,id=serial0,path=$CONSOLE_SOCKET,server=on,wait=off \
-serial chardev:serial0 \
-pidfile /tmp/$VM_ID.pid \
2>&1 | logger -p local0.notice &
```
Timecounter settings in the guests:
```
net$ sysctl -a |grep timecounter
kern.timecounter.choice = TSC(q=3000, f=1996800000 Hz)
clockinterrupt(q=0, f=100 Hz) ichlpcib0(q=1000, f=3579545 Hz)
hpet0(q=2000, f=100000000 Hz) ACPI-Safe(q=900, f=3579545 Hz)
lapic(q=-100, f=19200000 Hz) i8254(q=100, f=1193182 Hz)
dummy(q=-1000000, f=1000000 Hz)
kern.timecounter.hardware = TSC
kern.timecounter.timestepwarnings = 0
```
All I could find so far is [1]. It is recommended to add the rtc switch
to the qemu command. Is there any recommendation here in the meantime
which setting works best with NetBSD?
I would be very happy about a short recommendation or a field report. As
it is, this is the last remaining problem on my new virtual host,
powered by NetBSD, Qemu, NVMM and ZFS ZVOLS.
Kind regards
Matthias
[1] http://mail-index.netbsd.org/port-amd64/2021/05/09/msg003459.html
After a few days of intermittent successes and just as many setbacks, I
have now found a configuration that really works reliably. For all those
who may now face the same problem in the future, I would like to share
my notes on it.
SUMMARY
1) The host kernel is compiled with HZ=1000, the guest kernel stays at
HZ=100, so the guest clock drifts away slowly, and ntpd can compensate.
2) On the host I run ntpd as client + server. It serves as the only and
primary time source for the guest VMs. The ntpd.conf is the same as the
default configuration except for a restrict directive that allows the
local subnet of the VMs to access it.
3) In the guest VMs, ntpd runs as a client with only one time source
(namely, the host). The following non-default settings in ntpd.conf are
worth mentioning:
```
tinker panic 0 stepout 30
```
- panic: allows ntpd regular not to give up even with larger time jumps
- stepout: the default value is 900, i lower it to 30 so that large
deviations that qualify for stepping are also fixed relatively quickly
```
#tos minsane 2
```
- since i only use one time source, i comment this parameter out (you
can just as well set it to 1)
```
server 192.168.2.10 burst minpoll 4 maxpoll 6 true
```
- only use the one server (the IP address of the host)
- the settings for minpoll and maxpoll have proven to be appropriate
(minpoll 4 instead of the default of 6, maxpoll 6 instead of the default
of 10). This setting results in the time source being polled at a higher
frequency and therefore potential deviations are detected earlier.
CONCLUSION
- Due to the root cause (well explained in [1]) the clocks in the VMs
are potentially running slow.
- The root cause can be mitigated by a higher HZ in the host kernel
compared to the guest kernel.
- ntpd can be configured to cope well with the deviations that remain,
so that installation of 3rd party software (chrony) is not necessarily
required.
- The clocks run to the second with the described ntpd configuration
- With chrony I had initially obtained a similarly good result, but then
preferred to get by with the means of the base system (which is why I
gave ntpd another try)
OPEN ITEMS
- Build host modules with HZ=1000 [2] and test if this fixes ZFS
initialization issues with a HZ=1000 kernel
Kind regards
Matthias
[1] http://mail-index.netbsd.org/netbsd-users/2022/08/31/msg028896.html
[2] http://mail-index.netbsd.org/netbsd-users/2022/09/05/msg028948.html